Apache Airflow for Data Science #4 - Migrate Airflow MetaData DB to Postgres and Enable Parallelism

preview_player
Показать описание
Apache Airflow doesn't run tasks in parallel by default - but there's an easy fix. Learn how to migrate MetaDB to Postgres and enable parallel execution.

00:00 Introduction
00:58 Modify Airflow configuration file
02:33 Initialize Airflow database and create the user
04:55 Restart Apache Airflow
06:06 Outro

FOLLOW BETTER DATA SCIENCE

FREE “LEARN DATA SCIENCE MASTERPLAN” EBOOK

GEAR I USE
Рекомендации по теме
Комментарии
Автор

Thank you, much appreciated, piecing this togther from the documentation is not straight forward. Thank you.

Managed to connect my airflow cluster on WSL2 to postgres on the host and all is not smooth.

obiradaniel
Автор

Hello! Thanks for this amazing video. I am new to airflow, and i did the intial default setup in my local - using sqlite. But the scheduler goes down frequently and my task fails with following error: [2022-06-01, 02:30:24 IST] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL

[2022-06-01, 02:30:24 IST] {taskinstance.py:1267} INFO - Marking task as FAILED. dag_id=iris-pkg, task_id=split, execution_date=20220531T144529, start_date=20220531T152455, end_date=20220531T153024

[2022-06-01, 02:30:24 IST] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check

While reading about this error, i found that i should switch to postgres and localexecutor. This video has helped me understand the theory - Can you please help me setup the airflow with postgres from scratch. I dont have postgres db installed on my local. So, would installing the airflow[postgres] serve the purpose or do i need to install postgres locally??

aditya