Spark Basics | Partitions

preview_player
Показать описание
Spark is a distributed computing system that is used within Foundry to run data transformations at scale. This series covers the core Spark concepts you need to know for working with data in Foundry.

In this video we introduce partitions, discuss the importance of partition sizing, demonstrate how to find the count and size of partitions for a dataset in Foundry, and describe methods for changing the number of partitions in a Spark DataFrame.
Рекомендации по теме
Комментарии
Автор

Please keep this series going. Your spark tutorials are very useful. ! Making me love your product more and more

curiousMe
Автор

Hi Team,
Found this video really informative, I'll be really grateful if you guys can put some more data partitioning concepts and methods along with some advance best practices while working with spark.

I'm new to Spark, I wanna learn it very thoroughly.

Thanks

mactech
Автор

This video gave me ideas about my recurrent OOM driver problems, cause : many too small partitions

ENNAJIHamza
Автор

I had a requirement of having space in partition.But when I am writing data to S3 in parquet format with space in partition, it is failing
Can I please have a solution?

devaharshaveerla
Автор

Great video! More hadoop videos please)

MinecraftGamer
Автор

can we get into detail on the methods on repartition?

thousandsunny
Автор

The video quality is quite good, but I'd appreciate if the videos are more beginner friendly. 😀

adib
Автор

Use delta lake 2.0 and the optimize command and never worry about the headache of managing partition size or counts again.

gardnmi