How to build AWS Glue ETL with Python shell | Data pipeline | Read data from S3 and load Redshift

preview_player
Показать описание
In this video, we will develop AWS Glue ETL script using Python shell. We can now use Python scripts in AWS Glue to run small to medium-sized ETL (extract, transform, and load) workflow. Previously, AWS Glue jobs were limited to Apache Spark environment.
Python shell jobs in AWS Glue support scripts that are compatible with Python 2 and 3 and come pre-loaded with libraries such as the Boto3, Numpy, SciPy, pandas, and others. We can also, install other libraries via .whl file.

Subscribe to our channel:

---------------------------------------------
Follow me on social media!

---------------------------------------------

#Python #ETL #AWS

Topics covered in this video:
0:00 - Introduction ETL with Python shell
0:53 - Pre-Requisites
1:30 - Create Python .whl file
2:35 - Python ETL script
4:15 - Upload scripts to AWS
5:11 - AWS Glue ETL Job
6:33 - AWS Redshift table
6:49 - Execute Glue ETL Job
7:17 - Review Data & logs
Рекомендации по теме
Комментарии
Автор

This was extremely helpful! I really like that you are able to compress such valuable information in just 8 mins! I think it would be really useful to see how to build an ETL pipeline in a IaC framework. Haven't see many on the web! Thanks!

GiovanniDeCillis
Автор

You're a hero for the well explained content and then answering everyone's comments. :)

calvinbutler
Автор

Subscribed!!! Thank you so much for the great content!! Can you please make dedicated videos on how to use AWS Glue, Triggers, Lambda functions and Athena for ETL pipeline?

satishmajji
Автор

Thanks great video! Other examples I have seen used a crawler to write the schema of the redshift table to the data catalog before loading using a Glue Job.
If I just wanted to do this using only a Visual Glue Job and without a crawler, is it possible?

kofio
Автор

thanks!!! but I have a question, If my data comes from an API, is S3 not necessary?

ArniFuentes
Автор

Hi, how can we read the credentials from connections or secrets in aws glue python shell, it not working for me

koyalmudi
Автор

Hi - thanks for such concise content! I noticed that you deployed to S3 without debugging locally. Suppose i wanted to test the etl script before deploying it? is there a way to execute the etl.py script on the local host using aws_cli?

joegenshlea
Автор

Hi, Can we transfer 1Tb data from s3 to Redshift using Glue or Lambda +Glue ?

PawanKumar-glyw