Apache Airflow meetup: Data lakes and Cloud Composer

preview_player
Показать описание
First talk: Building data lakes with Apache Airflow by Bas Harenslak & Julian de Ruiter

A data lake provides data analysts and data scientists with a playground for developing new insights, models and products. Filling a data lake requires coordinating imports from a wide variety of data sources and Airflow nicely fits this purpose. In this talk we show how we use Airflow to manage batch imports/exports between a wide variety of systems, whilst maintaining a clean, shared codebase that allows us to write concise DAGs in a consistent manner. In particular, we will discuss building custom hooks/operators with clean, well-defined interfaces, whilst also touching on topics such as data (pseudo-)anonymization, testing and CI/CD.

Bas is a Big Data Hacker at GoDataDriven. Before joining the team he finished a Master's in Computer Science cum laude. He has hands-on experience with Hadoop and programming languages such as Python and Java.
Julian is a data scientist at GoDataDriven, who also enjoys dabbling in some data engineering. He previously studied at the Delft University of Technology, where he completed his Bachelor in Computer Science and his Master in Bioinformatics cum laude. After Delft, he spent his PhD exploring breast cancer, after which it made sense for Julian to use his skills in a more applied setting at GoDataDriven.

Second talk: Google Cloud Composer, managed Apache Airflow on the Google Cloud

Tahir Fayyaz, Big Data Specialist at Google Cloud, will give an introduction of Cloud Composer and how it interacts with the Google Cloud.
Рекомендации по теме
Комментарии
Автор

Great presentation! Thank you for sharing!

justinmillertech