coalesce vs repartition vs partitionBy in spark | Interview question Explained

preview_player
Показать описание
Hi All,
In this video, I have explained the concepts of coalesce, repartition, and partitionBy in apache spark.

To become a GKCodelabs Extended plan member you can check the below links, and purchase the Big Data end to end pipeline course in your preferred language Python or SCALA

PySpark course available at

Spark + SCALA course available at

End to End pipeline Introduction Videos:
Pyspark End to End Pipeline

Spark + Scala End to End Pipeline

Starter Pack available at just: ₹549 (For Indian Payments) or $9 (For non-Indian payments)
Extended Pack available at just: ₹1299 (For Indian Payments) or $19 (For non-Indian payments)

To become a GKCodelabs Extended plan member you can check the below links, and purchase the Big Data end to end pipeline course in your preferred language Python or SCALA

PySpark course available at

Spark + SCALA course available at

End to End pipeline Introduction Videos:
Pyspark End to End Pipeline

Spark + Scala End to End Pipeline

Starter Pack available at just: ₹549 (For Indian Payments) or $9 (For non-Indian payments)
Extended Pack available at just: ₹1299 (For Indian Payments) or $19 (For non-Indian payments)
Рекомендации по теме
Комментарии
Автор

When you do repartition and then partitionby already data is partitioned now based on partitionby column they why no of part file depend on repartition() again?

NikhileshwarYanamanagandla
Автор

Good explanation. I have question as you mentioned when your doing partition by age columns that will creating 3 partitions bcoz we have three age groups here. Let's assume I have 1000 unique Ids in a dataset. I have provided partition by Id column then how many partition it will create. On which basis it will create partitions. Could you please brief about this if you have time.

Thanks
Srikanth kita

srikanthk