filmov
tv
Building Data Pipelines Part 1: Airbnb's Airflow Vs Spotify's Luigi
Показать описание
Learn Airflow Here:
We recently wrote about ETLs and why they’re important. We wanted to provide an outline for what ETL tools are. You could refer to these ETL tools as workflow tools that help manage moving data from point A to point B.
Two of these popular workflow tools are Luigi by Spotify and Airflow by Airbnb. Both of these workflow engines have been developed to help in the design and execution of computationally heavy workflows that are used for data analysis.
If you need data consulting help, then reach out to our team here:
Also, if you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
Check out my Medium here:
Watch our video on the skills you need to be a data engineer then you can watch the one below
What Is a DAG?
Now before comparing Airflow to Luigi, it’s important we understand an important concept both libraries have in common. Both, essentially, build what is known as a directed acyclic graph (DAG). A DAG is a collection of tasks that run in a specific order with dependencies on previous tasks.
For example, if we had three tasks named Foo, Bar, and FooBar, it might be the case that Foo runs first and Bar and FooBar depend on Foo finishing.
This would create a basic graph like the one below. As you can see, there’s a clear path. Now imagine this with tens of hundreds of tasks.
Large data organizations have massive DAGs with dependencies on dependencies. Having clear access to the DAG allows companies to track where things are going wrong and doesn’t allow bad data into their data ecosystems because if something fails, it’ll often force the tasks downstream to wait until their dependencies are complete.
This is where tools like Airflow and Luigi come in handy.
We recently wrote about ETLs and why they’re important. We wanted to provide an outline for what ETL tools are. You could refer to these ETL tools as workflow tools that help manage moving data from point A to point B.
Two of these popular workflow tools are Luigi by Spotify and Airflow by Airbnb. Both of these workflow engines have been developed to help in the design and execution of computationally heavy workflows that are used for data analysis.
If you need data consulting help, then reach out to our team here:
Also, if you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
Check out my Medium here:
Watch our video on the skills you need to be a data engineer then you can watch the one below
What Is a DAG?
Now before comparing Airflow to Luigi, it’s important we understand an important concept both libraries have in common. Both, essentially, build what is known as a directed acyclic graph (DAG). A DAG is a collection of tasks that run in a specific order with dependencies on previous tasks.
For example, if we had three tasks named Foo, Bar, and FooBar, it might be the case that Foo runs first and Bar and FooBar depend on Foo finishing.
This would create a basic graph like the one below. As you can see, there’s a clear path. Now imagine this with tens of hundreds of tasks.
Large data organizations have massive DAGs with dependencies on dependencies. Having clear access to the DAG allows companies to track where things are going wrong and doesn’t allow bad data into their data ecosystems because if something fails, it’ll often force the tasks downstream to wait until their dependencies are complete.
This is where tools like Airflow and Luigi come in handy.
Комментарии