filmov
tv
Apache Airflow in the Cloud: Programmatically orchestrating workloads w/ Py - Satyasheel, Kaxil Naik
Показать описание
PyData London 2018
Apache Airflow is a pipeline orchestration tool for Python initially built by Airbnb and then open-sourced. It allows data engineers to configure multi-system workflows that are executed in parallel across any number of workers. A single pipeline may contain single or multiple operations like python, bash or submitting a spark-job into the cloud. Airflow is written in python and users can write their custom operators in python.
A data pipeline is a critical component of an effective data science product, and orchestrating pipeline tasks enables simpler development and more robust and scalable engineering.
In this tutorial, we will give a practical introduction to Apache Airflow.
---
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Apache Airflow is a pipeline orchestration tool for Python initially built by Airbnb and then open-sourced. It allows data engineers to configure multi-system workflows that are executed in parallel across any number of workers. A single pipeline may contain single or multiple operations like python, bash or submitting a spark-job into the cloud. Airflow is written in python and users can write their custom operators in python.
A data pipeline is a critical component of an effective data science product, and orchestrating pipeline tasks enables simpler development and more robust and scalable engineering.
In this tutorial, we will give a practical introduction to Apache Airflow.
---
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.