Why Spark is Faster Than Hadoop MapReduce

preview_player
Показать описание
In this video I talk about why Apache Spark's in memory processing. That's why Spark is so much faster than Mapreduce or other analytics frameworks. it's simple but awesome for stream processing and batch processing. That's why I explain first what stream and batch processing is.

►Learn Data Engineering with my Data Engineering Academy:

Check out my free 100+ pages data engineering cookbook on GitHub:

Please SUPPORT WHAT YOU LIKE:

- As an Amazon Associate I earn from qualifying purchases from Amazon. Just use this link:

#ApacheSpark #DataEngineering #PlumbersofDataScience #bigdata
Рекомендации по теме
Комментарии
Автор

Shortly: MR materializes intermediate state, however, data flow engines like Spark does not. They operate in-memory. Another important point is, Sorting is implicit in MR, so mappers will always sort the output. That is not the case with Spark. It can be done when it is needed only.

qwaszx
Автор

Still don't quite understand the picture in 6:54 . If the difference between spark and mapreduce is simply that spark is using RAM to store data and mapreduce use hard disk. How come mapreduce guys didn't think of it (everyone knows RAM is much faster than hard disk) ? And it seems one can't say Spark is any different from mapreduce if the RAM is the only difference (just like running the same quick sort algorithm in both slow and fast computer, it is still the same algorithm).

leecharlie
Автор

Thanks for the video! it's really helping me to get the concept

fajarnadril
Автор

Thank you ! Could have been 4 minute video too!

onewithsixonewithsix
Автор

A good 7 minutes totally wasted. I wish Youtube can bring back Dislike stats back..

louuuuuu
Автор

Should be 2 min video for this content

shirsendubasu