filmov
tv
18 - Apache Spark First Java Program - Create Spark RDD

Показать описание
--------------------------------------------------------------------------------
Chapter 03 - Apache Spark First Java Program - Create Spark RDD
--------------------------------------------------------------------------------
Create RDD from Spark Context object which is a fault-tolerant collection of elements that can be operated on in parallel.
There are two ways to create RDDs:
1) parallelizing an existing collection in the driver program (only used for POC or prototyping)
2) referencing a dataset in an external storage system, such as a shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat
Various Transformations (map, filter, etc.) and Actions (count, collect, etc.) can be called on RDD
#java #javadevelopers #javaprogramming #apachespark #spark