Apache Spark Internals: Understanding Physical Planning (Stages, Tasks & Pipelining)

preview_player
Показать описание
Let's explore how a logical plan is transformed into a physical plan in Apache Spark. The logical plan consists of RDDs, Dependencies and Partitions - it's our DAG. To schedule and execute this on a cluster, we need to transform this into a physical plan, comprised of Stages and Tasks. The Spark scheduler know how to schedule tasks on our workers (much similar as we saw in MapReduce).

Let's go on a journey and explore how Apache Spark works internally.

It will help us write much better code.

00:00 Intro
00:36 Recap: Logical plan, DAG, Dependencies
01:39 Transforming the Logical Plan into a Physical Plan
03:26 Pipelining: The key optimization
05:22 Shuffles & The Relation to MapReduce
06:52 Summary & Outro
Рекомендации по теме
Комментарии
Автор

many thanks for your entire spark series! great explanations and visualisations! highly appreciated! 🙂

sENkyyy