Spark Tutorial : Different ways to create RDD with examples?

Показать описание

There are three ways to create an RDD.

The first way to create an RDD is to parallelize an object collection, meaning

converting it to a distributed dataset that can be operated in parallel. This is simple and doesn’t require any data files.

This approach is often used to quickly try a feature or do some experimenting in Spark.

The way to parallelize an object collection is to call the parallelize method of the

SparkContext class.

First way to create RDD:

-------------------------------

val sc=new SparkContext("local[*]","union");

val stringList = Array("Welcome to spark tutorials","Spark examples")

Second way to create RDD:

-------------------------------

The second way to create an RDD is to read a dataset from a storage system, which

can be a local computer file system, HDFS, Cassandra, Amazon S3, and so on.

The first argument of the textFile method is an URI that points to a path or a file on the local machine or to a remote storage system. When it starts with an hdfs:// prefix, it

points to a path or a file that resides on HDFS, and when it starts with an s3n:// prefix, then it points to a path or a file that resides on AWS S3.

If a URI points to a directory, then the textFile method will read all the files in that directory.

The textFile method assumes each file is a text file and each line is delimited by a new line. The textFile method returns an RDD that represents all the lines in all the

files.

Third way to create RDD:

-------------------------------

The third way to create an RDD is by invoking one of the transformation operations on an existing RDD.

Рекомендации по теме

Spark Tutorial : Different ways to create RDD with examples?

PySpark Tutorial

Spark Tutorial For Beginners | Big Data Spark Tutorial | Apache Spark Tutorial | Simplilearn

Spark Full Course | Spark Tutorial For Beginners | Learn Apache Spark | Simplilearn

Apache Spark / PySpark Tutorial: Basics In 15 Mins

Apache Spark Full Course | Apache Spark Tutorial For Beginners | Learn Spark In 7 Hours |Simplilearn

10 Ways |Spark Performance Tuning | Apache Spark Tutorial

Spark performance optimization Part1 | How to do performance optimization in spark

Spark Basics | Partitions

Don't throw away your spark plugs! How to make a simple welding with spark plug and LED light...

Spark Architecture in 3 minutes| Spark components | How spark works

🔥Spark Full Course 2023 | Spark Tutorial For Beginners | Learn Apache Spark | Simplilearn

Spark Client Mode Vs Cluster Mode - Apache Spark Tutorial For Beginners

What Is Apache Spark?

What exactly is Apache Spark? | Big Data Tools

Spark Join and shuffle | Understanding the Internals of Spark Join | How Spark Shuffle works

Hadoop vs Spark | Hadoop And Spark Difference | Hadoop And Spark Training | Simplilearn

What Is Apache Spark? | Apache Spark Tutorial | Apache Spark For Beginners | Simplilearn

Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction

PySpark Training | PySpark Tutorial For Beginners | Apache Spark With Python Tutorial | Simplilearn

Spark Streaming Example with PySpark ❌ BEST Apache SPARK Structured STREAMING TUTORIAL with PySpark...

Spark Executor Core & Memory Explained

Spark Tutorial - Introduction to Dataframes

Quick Walk through of Spark UI

Apache Spark Full Course - Learn Apache Spark in 8 Hours | Apache Spark Tutorial | Edureka