filmov
tv
Dask in 8 Minutes: An Introduction

Показать описание
This video gives a general overview of the Dask project.
What is Dask?
Dask is a flexible library for parallel computing in Python.
Dask is composed of two parts:
1. Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
2. “Big Data” collections like parallel arrays, DataFrames, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.
Dask emphasizes the following virtues:
Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
Native: Enables distributed computing in pure Python with access to the PyData stack.
Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
Scales up: Runs resiliently on clusters with 1000s of cores
Scales down: Trivial to set up and run on a laptop in a single process
Responsive: Designed with interactive computing in mind, it provides rapid feedback and diagnostics to aid humans
Share your feedback with us in the comments and let us know:
- Did you find the video helpful?
- Have you used Dask before?
KEY MOMENTS
00:00 - Intro
00:08 - What does Dask do?
01:08 - Dask Array
01:43 - Where is Dask used?
02:58 - Examples of application
05:46 - How Does Dask Work?
06:15 - Where is Dask run?
00:06:48 Dask Open Source Community
What is Dask?
Dask is a flexible library for parallel computing in Python.
Dask is composed of two parts:
1. Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
2. “Big Data” collections like parallel arrays, DataFrames, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.
Dask emphasizes the following virtues:
Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
Native: Enables distributed computing in pure Python with access to the PyData stack.
Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
Scales up: Runs resiliently on clusters with 1000s of cores
Scales down: Trivial to set up and run on a laptop in a single process
Responsive: Designed with interactive computing in mind, it provides rapid feedback and diagnostics to aid humans
Share your feedback with us in the comments and let us know:
- Did you find the video helpful?
- Have you used Dask before?
KEY MOMENTS
00:00 - Intro
00:08 - What does Dask do?
01:08 - Dask Array
01:43 - Where is Dask used?
02:58 - Examples of application
05:46 - How Does Dask Work?
06:15 - Where is Dask run?
00:06:48 Dask Open Source Community
Комментарии