Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 2

preview_player
Показать описание
This is the part 2 of this Zillow data analytics end-to-end data engineering project.

In this data engineering project, we will learn how to build and automate a python ETL process that would extract real estate properties data from Zillow Rapid API, loads it unto amazon s3 bucket which then triggers a series of lambda functions which then ultimately transforms the data, converts into a csv file format and load the data into another S3 bucket using Apache Airflow. Apache airflow will utilize an S3KeySensor operator to monitor if the transformed data has been uploaded into the aws S3 bucket before attempting to load the data into an amazon redshift.

After the data is loaded into aws redshift, then we will connect amazon quicksight to the redshift cluster to then visualize the Zillow (rapid api data) data.

Apache Airflow is an open-source platform used for orchestrating and scheduling workflows of tasks and data pipelines. This project will entirely be carried out on AWS cloud platform.

In this video I will show you how to install Apache airflow from scratch and schedule your ETL pipeline. I will also show you how to use sensor in your ETL pipeline. In addition, I will show you how to setup aws lambda function from scratch, set up aws redshift and aws quicksight.

As this is a hands-on project, I highly encourage you to first watch the video in its entirety without typing along so that you can better understand the concepts and the workflows after which you should either try to replicate the example I showed without watching the video but consult the video when you are stuck or you could watch the video again the second time in its entirety while also typing along this time.

Remember the best way to learn is by doing it yourself – Get your hands dirty!

If you have any questions or comments, please leave them in the comment section below.

Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos.

**Books I recommend**

***************** Commands used in this video *****************
sudo apt update
sudo apt install python3-pip
sudo apt install python3.10-venv
python3 -m venv endtoendyoutube_venv
source endtoendyoutube_venv/bin/activate
pip install --upgrade awscli
sudo pip install apache-airflow
airflow standalone
pip install apache-airflow-providers-amazon
***************** USEFUL LINKS *****************

DISCLAIMER: This video and description has affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you.
#dataengineering #airflow
Рекомендации по теме
Комментарии
Автор

A wonderful in-depth explanation of lambda for DE. God bless you, Sir.
We really appreciate your beautiful soul.

RxLocum
Автор

Hi, Thanks for the video. It definitely solved my confusion points but still I have one doubt.
Why airflow ? I mean we could've used the lambda for that work too right?

coolkid
Автор

Absolutely wonderful tutorial, thank you for the great content man! I learned a ton from your videos and now subscribed from your channel

Jair
Автор

Thanks for this great project. I am getting stuck in the waiter object forbidden error. Can you please explain what the reason the time stamp of 28.13?

SivaKumar-ttdz
Автор

The second part of my error is:

The scheduler does not appear to be running. Last heartbeat was received 5 days ago.

The DAGs list may not update, and new tasks will not be scheduled.

I don't know the solution. I don't know where to run the code either.

Basically I am stuck in queue when hitting the rerun button in airflow. My previous comment is gone.

michealdmouse
Автор

Great project and great teacher. I seem to be stuck on part 2 right at the end. The part where you add a connection to S3 in Airflow UI. I can't for the life of me get Amazon Web Services to appear in the dropdown list of providers. Any tips/clues? I pip installed it making my python work, but the provider isn't appearing. Is it that Airflow UI expects to pick these up from one place and I'm installing somewhere else?

AlanThompson-jf
Автор

Hi @tuplespectra, I am stuck in connecting airflow to AWS, even though I use the command in the terminal: pip install I can't get the Amazon Web Services type connection to show up. Please, help me

marvinarismendiz
Автор

Hey guys. I have this error with importing pandas in my lambda function
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'pandas'
Traceback (most recent call last):

assieneolivier
Автор

Sometimes airflow is not launching properly through errors

manojkumaar
Автор

I've been waiting for this! Thanks!

sangiansang
join shbcf.ru