Apache Spark Internals: Understanding Physical Planning (Stages, Tasks & Pipelining)

Показать описание

Let's explore how a logical plan is transformed into a physical plan in Apache Spark. The logical plan consists of RDDs, Dependencies and Partitions - it's our DAG. To schedule and execute this on a cluster, we need to transform this into a physical plan, comprised of Stages and Tasks. The Spark scheduler know how to schedule tasks on our workers (much similar as we saw in MapReduce).

Let's go on a journey and explore how Apache Spark works internally.

It will help us write much better code.

00:00 Intro
00:36 Recap: Logical plan, DAG, Dependencies
01:39 Transforming the Logical Plan into a Physical Plan
03:26 Pipelining: The key optimization
05:22 Shuffles & The Relation to MapReduce
06:52 Summary & Outro

Рекомендации по теме

Комментарии

many thanks for your entire spark series! great explanations and visualisations! highly appreciated! 🙂

sENkyyy

Apache Spark Internals: Understanding Physical Planning (Stages, Tasks & Pipelining)

Apache Spark Internals: Understanding Physical Planning (Stages, Tasks & Pipelining)

Apache Spark Internals: Task Scheduling - Execution of a Physical Plan

Physical Plans in Spark SQL - David Vrba (Socialbakers)

Apache Spark - Spark Internals | Spark Execution Plan With Example | Spark Tutorial

Spark Logical & Physical Plan

Apache Spark Architecture | Spark Cluster Architecture Explained | Spark Training | Edureka

Master Reading Spark Query Plans

Apache Spark Internals: RDDs, Pipelining, Narrow & Wide Dependencies

Apache Spark Architecture - EXPLAINED!

04 Spark DataFrames & Execution Plans

Tuning and Debugging Apache Spark

A Deep Dive into Query Execution Engine of Spark SQL - Maryann Xue

The Internals of Stateful Stream Processing in Spark Structured Streaming -Jacek Laskowski

DataXDay - EN -The internals of query execution in Spark SQL

Physical Plans in Spark SQL—continues - David Vrba (Socialbakers)

Spark Execution Model | Spark Tutorial | Interview Questions

Scaling TB's of data with Apache Spark & Scala DSL at Production- Chetan Khatri-FOSSASIA 20...

A Deep Dive into the Catalyst Optimizer (Herman van Hovell)

Understanding Spark Execution

Improving Apache Spark's Reliability with DataSourceV2 - Ryan Blue

Advancing Spark - Understanding the Spark UI

Spark Data Frame Internals | Map Reduce Vs Spark RDD vs Spark Dataframe | Look inside the Dataframe

Lessons from the Field:Applying Best Practices to Your Apache Spark Applications with Silvio Fiorito

Understanding the Working of Apache Spark's Catalyst Optimizer in Improving the Query Performan...