Part 1 - Project Overview | What is AWS EMR | Extract and Transform Redfin data with AWS EMR

preview_player
Показать описание
#dataengineering #emr #spark #pyspark #jupyterlab #jupyternotebook #aws #emrstudio #etlpipeline #redfin
In this video, I explained what Amazon EMR (Elastic MapReduce) is all about and its benefits in processing big data. I then showed how you can create VPC and then spin up EMR clusters within this VPC. Later, I showed you how to create Amazon EMR studio and Jupyterlab after which I attached the Jupyter notebook to the provisioned cluster. I then showed how to write Pyspark code in the Jupyter notebook attached to the provisioned EMR to extract data from the Redfin data source, process it and load the transformed data as parquet file into an S3 bucket.

Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.

**Books I recommend**

***************** Commands used in this video *****************
Check out my github Repo

***************** USEFUL LINKS *****************

DISCLAIMER: This video and description have affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
#dataengineering #emr #spark #pyspark #jupyterlab #jupyternotebook #aws #emrstudio #etlpipeline #redfin
Рекомендации по теме
Комментарии
Автор

Incredible, this is the third of your projects that I replicate. I think it was time to make this comment, you have actually helped me a lot. I hope this channel grows a lot.

vfontefontechavez
Автор

hey, thanks for the project, helped me a lot to nail an interview :). How is EMR doing compared to aws databricks? Which would you recommend? Would be great if you could make a video on how to process streaming data (kinesis, firehorse, kafka?).

daviddean
visit shbcf.ru