Airflow for Beginners: Build Amazon books ETL Job in 10 mins

preview_player
Показать описание
Hey Data Engineering Enthusiasts!!
In this video we will be building an ETL data pipeline using Apache Airflow. This pipeline extracts data engineering books from amazon, and stores it in Postgres Database. The pipeline runs on a schedule and pulls data from the website.

This video will help you build a basic data pipeline and also get a repository of data engineering books, all at the same time.

Timestamps:
00:00 - Intro
01:00 - Pipeline Design
03:16 - Install Airflow
04:49 - Install PGAdmin
05:44 - Create Books db
06:45 - Create Postgres connection from Airflow
07:27 - Build DAG
09:32 - Define functions
10:43 - Add Tasks
11:26 - Dependencies
11:48 - Manually Trigger DAG
12:14 - Query data on PGAdmin
12:42 - Conclusion

Links:
Airflow Documentation
Code for PG Admin:
"""
postgres:
ports:
- "5432:5432"

pgadmin:

container_name: pgadmin4_container2

image: dpage/pgadmin4

restart: always

environment:

PGADMIN_DEFAULT_PASSWORD: root

ports:
- "5050:80"
pgadmin:
container_name: pgadmin4_container2
image: dpage/pgadmin4
restart: always
environment:
PGADMIN_DEFAULT_PASSWORD: root
ports:
- "5050:80"
"""

Hope you enjoy this video :)

Let me know in the comments about what you think of this video!!
Рекомендации по теме
Комментарии
Автор

Absolutely great content ! Short crisp and to the point.

adityatomar
Автор

Good content, crisp and to the point. Thank you for this!

DamaleSomnath
Автор

This is my first airflow implementation and its the best explanation that i have ever seen, any other projects available on your channel related to data engineering ?

rohankupate
Автор

Beautifully explained. Extremely clear. Love how you explained :) Thank You :)

KunaalNaik
Автор

thank you for this amazing work, that's the first data engineering tutorial that works to me... very well performed and organized. i got confused sometimes buts thats skill issue xD

OlafKoch-jy
Автор

Great content. To the point explanation!

NidaShaikh-zx
Автор

great explanation dear you make airflow easy

Smruti..-
Автор

You did a great job with explanation. But can you go a little slow next time? I think it's okay if the video duration becomes 20 mins. But some of us can't consume information when you go fast lol

tmrecords
Автор

Amazing skills to explain flow, can you please create the next video which you have informed at the end of the video.

HardikKanak-ju
Автор

Hello madam,
Thank you for such a good knowledge on above skills for data science projects.
Madam please do one end to end project using AWS cloud.
Thank you

vinodsagar
Автор

Hey Sunjana ! Thanks for sharing this but for someone like me who is a complete beginner, it's hard to follow along. Is there a step by step approach ?

anishnair
Автор

super sister, i have doubt apache airflow integration and connection

BlockRock-ro
Автор

Hi, i lose the database every time I docker compose down any way to retain the db?

tink
Автор

what if I want to add oracle database, as it is not available by default. how I can test the connectivity using python .

RumiAnalytics
Автор

Good one 👍 . Which tool used for pipeline arch design ?

munna
Автор

Great content, I am currently stuck at the place where I am inputting the Ip address on pgadmin, it is saying unable to connect to server. Connection timeout. What can I do ?

osayiprecious
Автор

Hey ! which camera did you use to record the vid? Great quality <3

adityatomar
Автор

you are using some kind for filter or LUTs, looks synthetic

Piyush-xctd
Автор

Can you address the IP address change?

tink
Автор

Your video is very speedy, please make with regular speed

HardikKanak-ju