WTF is MapReduce?? [Batch Processing] | Systems Design Interview 0 to 1 with Ex-Google SWE

preview_player
Показать описание
I'd post this to Blind but I think they're more in need of FapReduce instead of MapReduce
Рекомендации по теме
Комментарии
Автор

I like the humor you incorporate to these informative videos, def subbing 🔥

fallencheeto
Автор

Do not believe this man's lies. He did not choose to leave google for HFT, he was fired for taking an excessive amount of fairlife milks from the cafeteria, after multiple warnings. It was determined the net loss from his excessive milk consumption exceeded his positive contributions to the company.

brendansullivan
Автор

Also if I am understanding correctly, the reason we want sorted keys is because when we shuffle the keys into their respective node, we will be able to perform that O(n) merge join on multiple lists of key value pairs correct?

So for example, we take all k3, v from node 1 and all k9, v from node 3. Their hashing determines they all will be sent to node three, where a merge join will occur for those 2 lists. Is this accurate?

Great video!

timothyh
Автор

@3:20 should be Reducer: (key, List[value]) -> (key, value)

John-nhoJ
Автор

In the MapReduce architecture, should the shuffle step happen before sort happens? I am getting a sense that shuffle basically groups together the same key (say k3) from different nodes onto a single node (node 3 in this case); Then by sorting such shuffled keys, we are taking advantage of the reducer mechanism you demonstrated.

PavanBommana-
Автор

I found it a bit unclear how keys go from Sort to Shuffle. Do keys get redistributed first, and then get sorted locally? Or do keys get globally sorted first (ex. we could use n-way merge sort), and then get redistributed based on key ranges? I think the first flow sounds more reasonable, but then it conflicts with some process like merge join that was mentioned in the video, cuz if so merge join won't be necessary.

zhonglin
Автор

Are nodes 1, 2, and 3 supposed to be replicas of each other? Or simply 3 nodes storing three different data files?

timothyh
Автор

hey loving the videos so far, just one question at 4:12 if these are nodes of a hadoop cluster then arent they supposed to be replicas and hence roughly same
how do they begin with different raw data on each node
are these some sort of partitions??
I googled and on the surface found that theres not a partition system in hadoop as such

varundubey