Intro to Python Dask: Easy Big Data Analytics with Pandas!

preview_player
Показать описание
In this video, you will learn how to use Dask, a Python module that enables pandas code to run in parallel on your local machine or scaled out to multiple machines. No dataframe or numpy size limits and super-fast execution. Just pip install and go! It's that easy! Does it really work? Find out!

Join my Patreon Community and Watch this Video without Ads!

Twitter: @BryanCafferky

Notebook with Code at:

See my Master Databricks and Apache Spark series:
Рекомендации по теме
Комментарии
Автор

15:30 the rounding difference is probably because you use the full dataset in ddf and only a part in pdf. Very great introduction in Dusk! At the moment I am only working with numpy for data engineering (Deep Learning with Images). Would you say it makes sense to save images to pandas dataframes? It would probably make a lot of stuff easier and by using Dask even fast because of the parallelization.

arturkunz
Автор

Thanks for awesome introduction of Python Dask

atanu
Автор

how reliable it is to use it in production for data ingestion?

sawantamang
Автор

When do you realize you have to leverage Dask on a DF - What error message would you gte?

KOMPAJAM
Автор

Thanks for your video! I 've recently exploring DASK and realized that it only has read_sql_table, but no read_sql_query function. I used to read sql queries into python by using pyodbc/sqlalchemy, but it looks like it's not possible with DASK.

jamiew
Автор

Do I need know python prior this course ?

I
Автор

Great video as usual, thank you. But after about a week of hammering the subject, I could not load a data table from an Azure SQL database ☹️… back to pandas… (having to do loops to deal with the memory limits 🤦‍♂️)

ericxls