Dask in 8 Minutes: An Introduction

Показать описание

This video gives a general overview of the Dask project.

What is Dask?

Dask is a flexible library for parallel computing in Python.

Dask is composed of two parts:

1. Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.

2. “Big Data” collections like parallel arrays, DataFrames, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.

Dask emphasizes the following virtues:

Familiar: Provides parallelized NumPy array and Pandas DataFrame objects

Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.

Native: Enables distributed computing in pure Python with access to the PyData stack.

Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms

Scales up: Runs resiliently on clusters with 1000s of cores

Scales down: Trivial to set up and run on a laptop in a single process

Responsive: Designed with interactive computing in mind, it provides rapid feedback and diagnostics to aid humans

Share your feedback with us in the comments and let us know:

- Did you find the video helpful?
- Have you used Dask before?

KEY MOMENTS
00:00 - Intro
00:08 - What does Dask do?
01:08 - Dask Array
01:43 - Where is Dask used?
02:58 - Examples of application
05:46 - How Does Dask Work?
06:15 - Where is Dask run?
00:06:48 Dask Open Source Community

Рекомендации по теме

Комментарии

Accidentally browsed to this video. The best library intro in my opinion. Beautiful

ivangorshkov

Thank you Dask Team, will explore this and join the community

arkadipbasu

Had some issues with Ray, but Dask worked out of the Box! Congratulations to the Developers!

AlverGant

Great intro.
Also, how do I show those additional panes on the right shows an 2:05 to display memory usage and progress etc. That is pretty awesome.
Thanks so much

aria_nukil

This lib is awesome!!! Thanks a lot 😍😍

ghz

You are absurdly beautiful, my computer is literally in love with you 🥺💕

chadwooloo

Obrigado por ter legendas em Português .

rodrigoluca

I do not see Dask in my Anaconda Navigator. May be because I have Miniconda.

myabakhova

Dask in 8 Minutes: An Introduction

Dask in 8 Minutes: An Introduction

Dask Bag in 8 Minutes: An Introduction

Dask Array in 3 Minutes: An Introduction

Dask Futures in 11 Minutes: An Introduction

Dask Memory Management in 7 Minutes: An Introduction

Dask in 15 Minutes | Machine Learning & Data Science Open-source Spotlight #5

Dask Tutorial | Intro to Dask | The Power of Parallel Computing | Module One

Dask Delayed in 5 Minutes: An Introduction

Learn Dask Dataframe In 10 Minutes | Dask Dataframe Tutorial For Beginners (Hands-on Tutorials)

Dask Tutorial | Intro to Dask | Scale Your Python Workloads | Course Review

Dask Tutorial | Intro to Dask | Parallelize Python Code with Dask Delayed | Module Five

Dask on Single Machine with Coiled

Dask DataFrame: An Introduction

Dask Dashboard walkthrough

Dask Introduction - Parallel Computing In Python - Chapter 1

Intro to Python Dask: Easy Big Data Analytics with Pandas!

Dask Unmanaged Memory | How to Find & Fix

Dask Tutorial | Intro to Dask | Machine Learning with Dask ML | Module Four

Dask DataFrames Tutorial: Best practices for larger-than-memory dataframes

Dask Tutorial | Intro to Dask | Scale Your Python Workloads | Course Introduction

Dask+Distributed on GitHub data on S3

Mastering Parallel and Distributed Computing with Dask in Python

10 EPIC Power Metal Songs in 6 MINUTES

Design Principles of Distributed Systems with Dask and PySpark