Understanding Databricks & Apache Spark Performance Tuning: Lesson 01 - Spark Architecture

preview_player
Показать описание
A popular interview question and a critical topic for all Databricks and Spark developers, how do you tune and optimize Spark queries? This video provides a conceptual understanding of where things can go wrong as a starting point to understanding performance tuning and optimization.

Support me on Patreon

Slides
Рекомендации по теме
Комментарии
Автор

5:54, better comedian than half the comedians in the world

sarthakmane
Автор

I don't know if it's always true, but I've recently discovered that python can be significantly faster that some spark SQL operations such as joins. I'll check, but do you have a video about monitoring cluster performance? I kind of miss the ganglia ui. Thanks Bryan. As always, you're a great teacher and explainer of things. ❤

mfdba
Автор

is it possible to run spark nodes on already concurrent HDFS?

Andy-rwhn
Автор

11:50 I actually thought that the data for the query in the black box does not have to be distributed/indexed by City and the select/group-by can be easily made concurrent by itself

Andy-rwhn