Event-driven Data Pipelines with Apache Airflow - Airflow Summit 2024

preview_player
Показать описание
Presented by John Jackson at Airflow Summit 2024

Airflow is all about schedules…we use CRON strings and Timetable to define schedules, and there’s an Airflow Scheduler component that manages those timetables, and a lot more, to ensure that DAGs and tasks are addressed based on those schedules.

But what do you do if your data isn’t available on a schedule? What if data is coming from many sources, at varying times, and your job is to make sure it’s all as up-to-date as possible? An event-driven data pipeline may be the answer.

An event-driven architecture (or EDA) is an architecture pattern that uses events to decouple an application’s components. It relies on external events, not an internal schedule, to create loosely coupled data pipelines that determine when to take action, and what actions to take. In this session, we will discuss the design considerations when using Airflow in an EDA and the tools Airflow has to make this happen, including Datasets, REST API, Dynamic Task Mapping, custom Timetables, Sensors, and queues.
Рекомендации по теме
Комментарии
Автор

Been using EDA for close to 10y now and I was wondering how to do it correctly in AirFlow and this presentation really helped me a lot. Thanks! :)

PhilippeGrohrock
Автор

Can we access the code base used in this presentation anywhere?

pranaygawas