Part 4: PySpark Transformations - Repartition and Coalesce

preview_player
Показать описание
Connect with me here:

Subscribe to my channel:

Welcome again to the Pyspark Transformations and Actions.
In this video let us continue to understand about other two important transformations namely repartition and Coalesce,

Repartition:
PySpark Repartition is a concept in PySpark that is used to increase or decrease the partitions used for processing the RDD/Data Frame in PySpark model.

Coalesce:
The Coalesce function reduces the number of partitions in the PySpark Data Frame. By reducing it avoids the full shuffle of data and shuffles the data using the hash partitioner; this is the default shuffling mechanism used for shuffling the data.
Рекомендации по теме
Комментарии
Автор

How did you get this jupyter screen to run queries?

swagatikatripathy
Автор

Can u give any suggestions precise course about data engineer on gcp?

rakeshd