Introduction to Map/Reduce (Part 1/3)

preview_player
Показать описание
In this three part tutorial, Prof. Patterson shows how to get a Java program running in the Hadoop Map/Reduce framework used by Amazon's Web Services platform.

Part 1 is an overview of Map/Reduce and how it is used as a dataflow architecture to do BIg Data jobs.

Part 2 is an example of how to configure and program Eclipse to create a Java jar that can be uploaded to Amazon's Elastic Map/Reduce (EMR) service.

Part 3 demonstrates how to configure an Amazon cluster so that EMR works with EC2 and S3 to run a distributed data processing job
Рекомендации по теме
Комментарии
Автор

it was an excellent presentation . I also request, if possible and time permits, please make a tutorial on Conetion and inserting data in to a database (say : MySql) after filetring input data .

sudiptaghosh
Автор

Great talk!
What happens when a certain word appears twice in the same line? does the mapper output (word, 2) or [(word, 1), (word, 1)]?

eNtrozx