AWS EMR Big Data Processing with Spark and Hadoop | Python, PySpark, Step by Step Instructions

preview_player
Показать описание
In this video, I gave an overview of what EMR is and its benefits in the big data and machine learning world. I then provided a step by step instruction on how to spin up an EMR cluster and do a spark submit job on it to process data from a Stack Over Flow survey.

Рекомендации по теме
Комментарии
Автор

Thanks! Your video is far, far better than most out there!

Note to other viewers: If you decide to try this yourself, you should download the 2020 survey results, and not something newer. The fields are different in newer versions of the survey.

privatestuff
Автор

awesome, one of the few 30min tutorials for big data on aws that actually worked!

brothermalcolm
Автор

Wow, so informative!!! Thanks so much for teaching me how to do this!!!

maulikpatel
Автор

Good stuffs! Your videos are always so detailed and informative 👍🏻

aimeeyu
Автор

Wow, Felix, you are a fantastic teacher. Really happy to find your channel. Thanks

jasper
Автор

Crisp and to the point, thanks Felix please keep it up.

kkb_now_i_have_a_handle
Автор

very good and undervalued content keep up with the good work man :) you are helping a lot! I bet your channel will start growing soon.

BlazinEdit
Автор

This is what I was looking for. Thanks for the video.

mrmen
Автор

thank you sir for this session....love from india🙏

sakshi
Автор

Thanks for the AWS EMR Configuration details. How the underlying S3 or HDFS is distributing data blocks for parallel processing? How redundancy and parallelism can be configured? I have logs from airline equipment for the last 30 years, equivalent to 1 PB. I want to use all of it to identify failures with indicators.

ParijatKar
Автор

sir your explanation is very clear ...i request you to make end to end project videos on aws etl

WolfmaninKannada
Автор

Really great stuff... excellent presentation !!

VamsiKrishna-vfgm
Автор

good stuff Felix ..do u have video of migrating on premises to cloud bigdata cluster ?

chaithanyamannem
Автор

Thank you so much sir for detail explanation it will be very useful to us ❤❤ thanks a lot

electricalsir
Автор

Very informative. Can you do a deep dive of aws emr?

santoshraju
Автор

Great Video !! However, right now I think you do have to set up an IAM role for accessing your S3 bucket is it not ?

nkovrkh
Автор

Do you have any document to map the property graph model to Hadoop

chandnimirchandani
Автор

hi what is the difference in 4 applications? when u create the cluster there is 4 options, how do you know wchich one to select?

hikariuchiha
Автор

Thanks for the video ....can you help me with dependency files like the python uses other module ...how to go about that? when i want to submit with spark submit in EMR

tfilgsw