Scale R to Big Data with Hadoop & Spark

Показать описание

In this talk, we will show you Microsoft R Server, which is a Hadoop or Spark cluster where R is installed on every computer and is equipped with distributed processing libraries to utilize each and every computer in parallel. We’ll show you how to run your normal native R code via SSH, and how to get an RStudio server up and running on the cluster.

R is currently one of the most popular data science languages in the world. However, it’s always had constraints around scaling out to big data. What happens when you expand beyond a couple gigabytes of data? You packed up your data and you used something else; Python, Java, or Mahout to name a few. Now it’s possible to stick with R throughout your production analysis all the way to deployment, regardless of the data size.

Companies like Apache, Revolution Analytics, Microsoft, and H20 showed us this year that distributed computing in R is possible. Today we’ll take a look at what the Microsoft stack is doing in terms of scaling R up to big data.

We’ll show you how to wrangle data out of an HDFS and build machine-learning models from your large dataset. Then shows you how to pack up that model and deploy it to an elastically scaled web service so that anyone may call upon it for predictions and insights.

Outline:
· Setup a Spark cluster with R installed (R server)
· Wrangle data that is inside HDFS using R
· Build and deploy a machine learning model using R

Code and Prep Work (if you want to follow along):

Table of Contents:
0:00 Overview
1:20 Machine learning scaling
4:13 Popularity for data science
4:47 R as a movement
8:38 R limits
19:55 Sparks
21:40 R servers
26:56 R server on HDinsights
45:52 IDE
50:43 RStudio
1:02:44 Processing times

--

--

Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!

--

📱 Social media links

--

Also, join our communities:

_

#hadoop #spark #rprogramming #bigdata

Рекомендации по теме

Комментарии

Now I understand the basics. Thank you very much.

jensharbers

Is scaleR still available? Or are there newer solution to deal with the memory problem? Can the package be used if you are not using a server? Very nice talk by the way :)

suzannevangestel

Do we have a function in sparkR or sparklyR to read netcdf files?

rajanikumar

Scale R to Big Data with Hadoop & Spark

Scale R to Big Data with Hadoop & Spark

Big Data In 5 Minutes | What Is Big Data?| Big Data Analytics | Big Data Tutorial | Simplilearn

Hadoop In 5 Minutes | What Is Hadoop? | Introduction To Hadoop | Hadoop Explained |Simplilearn

Standardization vs Normalization Clearly Explained!

Adding intelligence to your applications using big data at scale

R/Database: Using R at Scale on Database Data

Free Webinar - Big Data, Fast Data: Using Spark and PySpark to Scale Data Insights

Data Science at Scale with R on GCP (Cloud Next '19)

6 Pitfalls of Large-Scale Psychological Safety Initiatives

Build Your Own Universe: Scale High-quality Research Data Provisioning with R Packages

How to Scale Data with the SKLearn Min Max Scaler #shorts

R with Hadoop for large-scale analytics by JOSE LUIS LÓPEZ at Big Data Spain 2014

Data Wrangling Explained in 60 Seconds | Data Engineering | Big Data | SCALER

Real-Time Data and Big Data GIS at a Massive Scale

Process HUGE Data Sets in Pandas

From Prototyping to Deployment at Scale with R and sparklyr (Kevin Kuo)

Should You Scale Your Data ??? : Data Science Concepts

Geospatial Analytics and AI at Scale with Big Data Toolkit

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

Big Data Analytics on Massive Scale Graphs

VFX Artist Reveals the TRUE Scale of Data!

Top 3 Must Have Skills for a Data Engineer - Part 1 | Data Engineering | Big Data #shorts

Map Reduce explained with example | System Design

SGI: From Extreme Scale Computing to Big Data