Airflow, Spark, EMR - Building a Batch Data Pipeline by Emma Tang

preview_player
Показать описание
Robust and user friendly data pipelines are at the foundation of powerful analytics, machine learning, and is at the core of allowing companies scale with their data. In this talk, we will walk through how to get started building a batch processing data pipeline end to end using Airflow, Spark on EMR. Through real code and live examples we will explore one of the most popular OSS data pipeline stacks.
Рекомендации по теме
Комментарии
Автор

I appreciate how ambitious this presentation is. Spinning up all of these resources and using them in a live demo -- nicely done!

matpataki
Автор

Halfway in and I already admire you taking on the daunting task of spinning up things live. Also, the way you look at the screens up top is super cute.

AmanGarg
Автор

Emma Tang,
Can you please add the details of airflow, git hub resources in description?

saranyaelumalai
Автор

can you pls put GitHub link in description ?

dsinghr
Автор

@Emma, can you post the relevant github repos too? I'm trying to follow along this tutorial

musicisglobal
Автор

Awesome content, is there any GitHub repo for this content ?

PrakashReddyK
Автор

The presentation has a misleading title, it should have been titled - "let's play with Kubernetes and see what that is....and you also can run dockerized airflow on it. The end". So much time wasted.

dmitrysemenov