Intro to Amazon EMR - Big Data Tutorial using Spark

preview_player
Показать описание
Edit*
Make sure you encrypt your Spark script as you upload it inside S3 (timestamp: 13:42)
There's a small typo in line 41 of the code, should be "add_argument"

Intro
Today we're going to talk about a popular tool in Data Engineering. Amazon EMR is an industry-leading big data platform. It's a really mature service developed way back in 2009, and draws a lot of heuristics from the Apache Hadoop project. EMR is used for processing terabytes worth of data, and training machine learning models. In this tutorial, we'll dive deep into EMR's architecture, a live demo on how to trigger jobs using Steps, and demonstrate how to use Spark to extrapolate data from Amazon S3. Hope you enjoy this one!

Timestamps ⏰
0:00 Intro
1:16 Overview of Amazon EMR
5:10 Create filesystem, VPC, and configure EMR cluster
9:04 Writing our Spark script
13:42 3 ways to Trigger Steps in EMR
18:32 SSH into Resource Manager in YARN
19:50 Enable EMR managed auto-scaling
20:57 Summary

Notes from video 📝

Who am I? 🙋🏻‍♂️
I'm Jay, I love making videos about travel, self-help and tech. I currently work in New York City as a data engineer, but I grew up in Malaysia and lived in the UK when I was 19. Back then, I had no idea what life was about, moving to so many places, navigating career in Tech. Today, I've learned a lot and wanna share my perspective through filmmaking.

Socials 📱

Sub Count: 4,539
Рекомендации по теме
Комментарии
Автор

I hope you create more videos about AWS services. Loved the way you explain things, perfect for beginners.

jovelynobias
Автор

So sad your channel doesn't have more tutorials like this :( thank you so much!

Munk-tttz
Автор

This is an outstanding tutorial. Thank you for making this!

JeffSylvan-yj
Автор

We need more videos Jaaay 🙏🏻💪🏻 You're awesome dude!

miguelhermar
Автор

Absolutely enjoyed watching the entire video. I felt this video is gonna be great start to understand EMR. Thanks for making it jay

harishchitluri
Автор

awesome explanation, simple, subtle and to the point!

vineethdas
Автор

thank you!! I watched the YouTube demo and it was really helpful. I also want to study spark on eks

jeahyunkim
Автор

Hey, thank you so much!!.. you really explain very well!

vmmismagic
Автор

Very clear! Thank you for sharing this excellent tutorial!

yutao
Автор

This is so goood :). Please keep making these kind of videos! Hello from Seattle

isaaclee
Автор

great tutorial! can’t wait to see more

sunnyzhong
Автор

this is crazy ❤❤❤ wish i had seen this earlier ! is this how the whole amazon product in a actual work flow look like? and also could you maybe make another showing azure system? pleaaase

StartDataLate
Автор

impressive and informative video, good job, go on doing tutorials plss :) Would be very interesting to see a video about spark and snowflake on your channel!

Ярослав-юнз
Автор

Your video is very interesting!
Hope you release many new videos :)

thanhchien
Автор

the type of video that makes me wanna quit the field because of how bad i feel about the level I am in, but its a very helpful video though

mahmoudfadaly
Автор

Could you share more about project for data engineer beginners? I have start to learn to be a DE recently and I hope to know more about some personal project that help me to enhance my skills. Thank you so much for your sharing and waiting for your next video :> Have a good day

NhungNguyen-whuf
Автор

great video! can you make also for AWS Glue? Thank you!

errrbrrr
Автор

You killed it. Loved it! Extremely useful

martinghiena
Автор

More videos on Streaming, Airflow and Spark

_its_ck
Автор

Fantastic tutorial indeed! I did as instructed and I got two fails in deploying the 'add step' part of the EMR Cluster stage, any insights would be appreciated.

bishop