Dask + Pandas for Parallel ETL

preview_player
Показать описание
Dask and Pandas work together to provide intuitive data processing at very large scale. This video loads a few hundred gigabytes of Parquet data loaded from Amazon S3, and then does some basic analysis. It gives a sense for how Dask and Pandas are used together.

Key Moments
00:00 Intro
00:40 Load Parquet Data
02:32 Explore Data
08:13 Dataframe Documentation
09:10 Get Machines
11:45 Next Steps

---
Scale Your Python Workloads with Dask and Coiled.
Coiled is a Dask company. With Coiled's rock-solid infrastructure, you can quickly and securely create Dask clusters in your cloud account.

Learn more about Coiled and get started for free

More content on our blog:
Рекомендации по теме
Комментарии
Автор

The source was one only big parquet file ? Dask set partitions by itself ?

FabioRBelotto
Автор

If I run Dask without importing the client, it does not work on many workers ?

FabioRBelotto