21 Broadcast Variable and Accumulators in Spark | How to use Spark Broadcast Variables

preview_player
Показать описание
Video explains - What are Distributed variable in Spark? How they work? What is Broadcast variable? What are Accumulators?

Chapters
00:00 - Introduction
02:24 - Broadcast Variable
06:57 - Accumulators

The series provides a step-by-step guide to learning PySpark, a popular open-source distributed computing framework that is used for big data processing.

New video in every 3 days ❤️

#spark #pyspark #python #dataengineering
Рекомендации по теме
Комментарии
Автор

@8:50, I have one small doubt " we have already filtered out the department_id == 6, In that case we wont have any other department other than 6. Do we need to really groupBy(department_id) after filtering ?? ".

sureshraina
Автор

hi sir, what is the difference between broadcast join and broadcast variable.
in broadcast join also a copy of smaller dataframe is stored at each executor, so no shuffling happens across the executors

devarajusankruth
Автор

one doubt sir, When I did direct where, sum, it took 0.8s for both stages. Whereas accumulator took 3s. Is it due to the forced use case for demonstration? Can you give me a example where accumulator could benefit? Even computation wise, accumulator went row by row, where as filter and exchange seems using less compute.

ayyappahemanth
Автор

Hi Subham, I have few questions on Cache and Broadcast

1. Can we un broadcast the dataframes or variables like we unpersist?
2. Whenever our cluster is terminated, restarted again, Does the broadcasted variables or cached data is still exist? or it get's vanished every time our cluster is terminated?

NiteeshKumarPinjala
Автор

In last video you mentioned that we should avoid UDF but here you used it during getting the broadcast value. Will it impact the performance?

TechnoSparkBigData
Автор

can accumulator variables be used to calculate avg as well? as when we are calculating the sum it can do for each executors but average wont work in the same way.

sushantashow
Автор

pls can you provide the link to download sample data ?

at-cvky