Airflow Vs. Dagster: The Full Breakdown!

preview_player
Показать описание
In this video I'll give you a full breakdown of the differences between Airflow and Dagster so that you can make an informed decision on which solution is best for you! Hope this helps anyone out there who is trying to decide between the two!
Рекомендации по теме
Комментарии
Автор

This video is just right on point! I had my first job as a DE recently and was tasked with migrating all the cronjobs to an orchestration tool and I was looking for the best option, and now i'm pretty sure that we'll be better off with Airflow.
Thank you and keep up the good work my man

thanhbinh
Автор

This an interesting video but it is fairly inaccurate about Dagster, I'm sure not out of malice, but probably because op is more familiar with Airflow.

for ex... Dagster is open-source, it is super extensible and modular, etc.

I'd also point out a pretty important difference between Dagster and Airflow, Dagster enables a local to production test-build and deploy cycle, which is not really possible with Airflow. Also, Dagster comes with a ton of automation capabilities that just aren't possible with an imperative orchestator like Airflow.

This is a pretty deep subject that requires a fair amount of knowledge by the author to really give a fair comparison, and it's somewhat lacking in this video.

jarredthedataengineer
Автор

Coming here as someone who uses dagster daily and wants to know if Airflow is worth it so appreciate this comparison

A few things on the Dagster side: for the first example you can do exactly what you have in Airflow in Dagster. You can create branching logic by having an Op have multi outputs (not all required) and only output the single one for the day of the week. You can wrap this branching Op and the specific day of the week Ops in a graph and build this graph into one of the assets shown. If guitar lessons, family dinner, etc... produce assets, you can just make them their own assets and have a similar not required feature where they only fire on their specific day of the week. In the UI you can expand the assets to their Ops and Graphs to see the branching logic
I use this for example by training a ML model every monday and then running predictions using it after. Every other day of the week, we just use the previous model for predictions without retraining

I don't really understand the point about testing in dagster? You can add assertions/raise errors in the Dagster Assets, there's also hooks which are separate functions that run after the completion of an asset (these can send messages to slack, do any quality checks, etc... it's just a python function) - which is just nicer to keep things separate. Most of those logs you're seeing in Dagster will be user specified as the logger gets passed into the Asset function - I log debug info, errors, warnings, etc...

I don't really understand the last point about dagster api?? You can run anything in Dagster, for example if you want to trigger something in Fivetran or DBT Cloud, the dagster code is just hitting the endpoint and polling while computations are done elsewhere. You can set up your own api's to do a similar thing. I don't really like how Dagster couples compute and orchestration so much but it seems like Airflow is doing a similar thing and you don't have to use Dagster this way. There's IO managers to manage the data passing between assets. This doesn't have to be JSON data from an API but any python variable. I run dagster on kubernetes where each asset is run in it's own pod so I'll use S3 or GCS, etc... to pickle the python objects and pass between pods. My understanding is that this is an advantage dagster has because it type checks the data going between pods. There's other tasks where my assets just run cli, one example being running scripts in R

baja
Автор

would airflow be a good fit to orquestate a couple of python scripts to send marketing emails to our customers based on certain criteria?

is there something better for this application?

ricardomalla
Автор

Thank you. I feel privileged for making the video on my request. I know I know, I will take the whole of the credits :D

datalearningsihan
Автор

Dagster is open source according to the homepage

ofnotandi
Автор

Hey, thanks a lot for the insightful overview! And your channel is awesome for Airflow content.
I'd love to see a similar comparisons with Flyte and Kestra

luiztauffer
Автор

awesome stuff bro. question, is there any reason why not just to use these things as schedulers and just have them spin up containers that hold the code? i feel like you get tied to a specific framework and it turns into a nightmare...

nixbruh
Автор

Great content... (horrid audio, was your landlady vacuuming?)

joshuasmith
Автор

Love the content! Audio could be better, squeaky chair and booming background noise are a little distracting

christophergutknecht
Автор

I'm pretty sure Dagster is open source

StefanoMessina-uxmj