Building Data Pipelines Part 1: Airbnb's Airflow Vs Spotify's Luigi

preview_player
Показать описание
Learn Airflow Here:

We recently wrote about ETLs and why they’re important. We wanted to provide an outline for what ETL tools are. You could refer to these ETL tools as workflow tools that help manage moving data from point A to point B.

Two of these popular workflow tools are Luigi by Spotify and Airflow by Airbnb. Both of these workflow engines have been developed to help in the design and execution of computationally heavy workflows that are used for data analysis.

If you need data consulting help, then reach out to our team here:

Also, if you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Check out my Medium here:

Watch our video on the skills you need to be a data engineer then you can watch the one below

What Is a DAG?

Now before comparing Airflow to Luigi, it’s important we understand an important concept both libraries have in common. Both, essentially, build what is known as a directed acyclic graph (DAG). A DAG is a collection of tasks that run in a specific order with dependencies on previous tasks.

For example, if we had three tasks named Foo, Bar, and FooBar, it might be the case that Foo runs first and Bar and FooBar depend on Foo finishing.

This would create a basic graph like the one below. As you can see, there’s a clear path. Now imagine this with tens of hundreds of tasks.

Large data organizations have massive DAGs with dependencies on dependencies. Having clear access to the DAG allows companies to track where things are going wrong and doesn’t allow bad data into their data ecosystems because if something fails, it’ll often force the tasks downstream to wait until their dependencies are complete.

This is where tools like Airflow and Luigi come in handy.
Рекомендации по теме
Комментарии
Автор

This is amazing! Thank you for explaining things in a clear and concise manner. I'm hooked and look forward to more videos in this series.

hahah
Автор

Seattle Data Guy, Your are doing great work in Data Engineering world. Your videos are clear and helping me a lot in DE field. Thank you and appreciate your work.

kiranmudradi
Автор

Thanks man, every other video talks about ETL and gives the same old theory again without actually discussing and showing the actual tools and processes.

singhsandeep
Автор

Great video man, I'm only recently trying to gain a foothold in data engineering, but your Videos are always a great help. I hope you don't get discouraged by the smaller popularity of DE videos compared to DS and keep it up like this! :)

fs_
Автор

Thank you so much for explaining. Before this i didnt know luigi existed...
My org just moved from crontab to airflow

StartDataLate
Автор

Nice one! Would love to see advantages/disadvantages, os and how to deploy it. Also other tools! (Databricks, dagster)

joeeeee
Автор

Awesome video Data Guy! Always learn something new watching your channel. I was wondering, it might have been mentioned in a previous video- where you talk about a possible reason for data engineers being paid slightly less at the big companies but still having a higher average salary. I think you said one of those reasons was because the tech giants already have their data infrastructure in place, and building that infrastructure might be some of the more valuable work that a data engineer does.

Would this type of work consist of like choosing what tools/stack to use for the data engineering of a company? And would building etls/pipelines not fall under that? Or what type of work were you thinking of when you said that the large companies often already have their data infrastructure in place, which could be the more valuable work sometimes?

Chiefnice
Автор

I really liked this video I am in my first year of my data engineering job.Would request you to make more videos like this.

SumitKumar-pgfs
Автор

Wondering your thoughts on a modular data science framework called Kedro. It seems young and still growing, but I have had some good experience with it as a means for writing modular code where modules are wired together into a DAG such that modules have no knowledge of predecessors or successors and are essentially just a function pipeline.

joeycarson
Автор

Airflow looks much more clean then Luigi.

metaller_alex
Автор

Thank you, amazing good explaining : ) Can you please let me know if it would be able to have a "Profile Picture" in the left navigation using LUIGI Framework. Or is it strict to only navigation elements? (Vision is having a Picture, than the navigation list below).

melvink.
Автор

Ok... I'm old style guy. When I hear DAG, my first question is "can this tool make loops between DAG nodes"? :) Also ETL\ELT... Are you sure that it has real possibilities? It's just good scheduling tool, that very often provides possibilities to overengineer problem.

yustas
Автор

awesome video and explanation, thank you
Also appreciated that you didn't tell us to like comment and subscribe (which I did anyways btw)

lucaguarro
Автор

Question. I see this phenomenon all the time and I'm always so surprised by it. If you go to timestamp 12:16, you'll see what I'm talking about. Coders on YouTube walking thru code, but the font size of the code is tiny (like ~2pt). Surely the code is some of the most important part, right? Like, why would you not choose to choke-up on that a little; zoom-in so we could see it easily. Maybe just show sections of code at a time - at least it would be legible. I mean, you've dedicated time to talk about it, so it must be important. If it is important, why not make it visible / legible? Someone will have to explain this to me one day. I think it is some sort of secret professional convention among YouTube coders to only show code in miniscule font sizes. Hey, I wonder... If I start making YouTube videos with the code too small to see would that certify me as a professional coder? Asking for a friend.

EVUTube
Автор

Please check out apache-beam as well, which I found very much helpful than these tools for data pipelines but it would be great if you can do some videos on it...

thbatman