Why Data Engineers LOVE/HATE Airflow (FT. @mehdio , @startdataengineering and more!)

preview_player
Показать описание
Airflow is a favorite tool of many data engineers. But some data engineers dislike it.

It can be tricky to scale and hard to manage if set up incorrectly.

Let's talk about it.

If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1

If you would like to learn more about data engineering, then check out Googles GCP certificate

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Or check out my blog

And if you want to support the channel, then you can become a paid member of my newsletter

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

Thank you for making this video. I don't want to over promote Airflow because I'm obviously a little bit biased, but I do think a lot of people still know Airflow from version 1.10.X and haven't tried 2.X yet. Many things have been fixed (performances, dag autorhing, UI, etc.). The gap is just huge. Also, I would say the flexibility/freedom that Airflow brings is a double edge sword: You can do a lot, you can configure many things, touch any details to fit perfectly with you needs, but the deeper you go the steeper the learning curve. It's easy to get lost in all features and parameterable things that Airflow brings. However, it's relatively easy IMHO if you just want to run data pipelines and execute a few tasks.❤

MarcLamberti
Автор

We've adopted Argo Workflows, which is a Cloud Native Computing Foundation project built on top of Kubernetes.

rdean
Автор

Awesome video, a very balanced perspective without focusing only on strengths or weaknesses of any single tool 👍

anna__geller
Автор

This was very interesting, glad to hear the different insights. Hope to see more collaborations in the community

miguelvera
Автор

I for one didn't face any issue while working with Xcoms, specially with large dataset using custom backend of Azure Blob storage. And Airflow by design is an orchestrator, so offloading computation is more sensible.

chetansurwade
Автор

Cool journalist approach, glad to have other's opinion included! 👍

mehdio
Автор

Thanks a lot, much appreciate. I plan to use Apache Kafka on log system. In order to add maintenability to my ETL (transform on Kafka and before ElasticSearch), I wish to add air flow. But Apache Kafka connect look pretty good too. Over this 2 solutions, what will you choose for an ELK + Kafka Pipeline ?

nashaeshire
Автор

Great video! I would love to hear your opinion about Apache Kedro.

mohamedyasser
Автор

I've only used airflow in a narrow capacity for handling scheduling & dependencies. What's the k8s drama about? I've never heard airflow and k8s used in the same sentence before

brettstoddard
Автор

Used it for years, I also tried the later 2.x version, I still don't like it, and I think there are better ways of architecting pipelines. But yeah I was amazed when I saw Airflow the first time, and it did solve a lot of problems, but I still think, it is a tool of the past. I hope I am wrong!

DataPains
Автор

Can you review the Meta Database Engineer Professional Certificate on Coursera when it comes out?

gava
Автор

You should summarize pros and cons in the beginning

janswee
Автор

Interesting point of views, thanks for the video. As I see it, technology evolves, but the tech stacks, getting crazy complicated. At the end, mostly it got stuck on the budget, get someone cheap (overpromise data engineer) and you are getting headache, can't move from dev environment and most of the data pipelines are sql at the end. But I could be wrong.

peterbizik
Автор

I don't think there is best ETL pipeline and I would not bother finding the best one. Each company and team operates differently depending on their skillset, Line of Business & priorities. I never had problem while working in SSIS and rarely have problem while working in Data factory either. Yes, each tool have lots of limitations but you will find a way to overcome those limitations.
One thing which I liked about Azure Data Factory is its ease of use with no code and extremely cheap to maintain. Yes, I like to code in Python and work on airflow which gives extreme flexibility which I couldn't have it in ADF but if ADF gives me headache then I will go with this tool anyway. I've onboarded a junior dev who have never worked in any ETL tool in a week. It's that easy.
May time we, data engineers, spend our time tweaking and finding best tools possible in the market but companies hired us to deliver result.

anildangol
Автор

I'm halfway through this video and I still don't know wtf AIrflow is. I know it has a k8s operator but I have no idea what it is or why I would use it. Maybe this video is for advanced people.

robot
Автор

What is the different between airflow and astronomer, can you help me sir ?

mauludinrohman
Автор

Hi
I’m a new subscriber and I just saw your video of “roadmap to become a data engineer” and, I wonder if you could advice me a course to learn python.
You channel is awesome

Emanuel-ybqk
Автор

What are peoples thoughts on what data engineer career progression is like because you dont gain a qualification, only work experience???

sana-szue
Автор

As a starter in Modern Data Stack, should i learn Prefect or Airflow ? What you recommend

datawitharslan
Автор

Is Pentaho PDI used for different purposes than Airflow??

rguez