Session -1 What is Apache Spark and its benefits? | Mastering Apache spark in an efficient way

preview_player
Показать описание
Mastering Apache spark in an efficient way!
******************************************************************************************************
1.What is Apache spark?
2. Benefits of Apache Spark?

1.What is Apache spark?
*******************************
General purpose
in memory
lightning fast
unified compute engine

Which is able to process Peta byte scale of load and it is able to handle Data Engineering, Data Science
and Machine learning application on single node or multi node cluster.

2. Benefits of Apache Spark?

Speed:- About 100x faster in memory and 10x faster on the disk
----------------

Multilingual: -Provides APIs in Python , R , Java and Scala
-------------------------

In-Memory Computation in Spark: - With in-memory processing, we can increase the processing speed.
-------------------------------

Fault Tolerance in Spark: - Apache Spark provides fault tolerance through Spark abstraction-RDD
-----------------------------------
RDD's

RDD1----RDD2---RDD3 ----RDD4

DAG , Linage graph

RDD's are immutable in nature.

Lazy Evaluation in Apache Spark: - All the transformations we make in Spark RDD are Lazy in nature.
-------------------------------------
2 Types of operation

1. Action 2. Transformation

RDD1----RDD2---RDD3 ----RDD4
Action applied.

Real-Time Stream Processing: - Spark has a provision for real-time stream processing.
---------------------------
Real time data processing

Open-source community: - The best thing about Apache Spark is, it has a massive Open-source community behind it.
----------------------

General Purpose Compute Engine: -Only one way of writing the code for max operation.
-----------------------------

Data cleaning -- Apache pig
Query --- Apache hive
ML --- Mahout
Streaming data anlytics --- Apache Storm

Spark says-- Just come to me and learn 1 way of coding and you will be able to do all of these
perticular task.
Рекомендации по теме