filmov
tv
Apache Spark Architecture : Run Time Architecture of Spark Application
Показать описание
In this video, you will get to know the nuts and bolts of apache spark architecture. You will also understand how a spark application runs when a spark job is submitted.
Run time Architecture of spark application
-------------------------------------------------------------------
Apache Spark uses master/slave architecture. The client submits an spark user application code.
When an application code is submitted, the driver implicitly converts user code containing transformations & actions to logical directed acyclic graph (DAG). At this stage, it also performs optimizations , such as pipelining transformations. Then it converts the logical graph (DAG) into physical execution plan with set of stages. After converting into physical execution plan, it creates physical execution units called tasks under each stage. Then the tasks are bundled to be sent to the cluster.
Now the driver talks to cluster manager and negotiates for resources. Cluster manager launches executors in worker nodes, on behalf of driver.
At this point, the driver will send the tasks to the executors based on data placement. When executors start, they register themselves with the driver, so driver will have complete view of all executors. Executors start executing the tasks assigned by the driver program. At any point of time, when the application is running, driver program will monitor the set of executors that runs.
Driver will also schedule future tasks in appropriate location,
based on data placement.
The user program may cache data in certain locations (using cache method or persist method). Driver tracks the location of cached data and uses it to schedule future tasks that access that data.
In next video, we will learn about RDD, which is the Spark's core system
Комментарии