filmov
tv
Data Science on GPU clusters with Dask and RAPIDS

Показать описание
Data scientists spend their time combing through data sources, generating insights, and experimenting with models. While Python has a vast ecosystem of tools to do all of this, there are challenges when data sizes and computational complexity grows. Best case, a data scientist is stuck waiting for code to run overnight or even over several days. Worst case, certain analyses or models are just not possible.
This talk presents how to accelerate data science in Python using Dask and RAPIDS. Dask is a parallel computing framework that scales from your laptop to a cluster of thousands of machines. RAPIDS is a GPU-computing framework that pushes traditional CPU workloads to the GPU. Dask and RAPIDS together allow you to scale to clusters of GPU machines. This talk will help you navigate this exciting new world, and show how easy it is to get your workloads running faster.
There is a live demo of distributed machine learning model training using Dask and RAPIDS across a GPU cluster.
Aaron Richter is a software developer turned data engineer and data scientist. He has pioneered the development and implementation of large-scale data science infrastructure in both business and research environments. Inevitably, he spent a lot of time finding efficient ways to clean data, run pipelines, and tune models. Aaron is currently a Senior Data Scientist at Saturn Cloud, where he works to make data scientists faster and happier. He holds a PhD in machine learning from Florida Atlantic University.
This talk presents how to accelerate data science in Python using Dask and RAPIDS. Dask is a parallel computing framework that scales from your laptop to a cluster of thousands of machines. RAPIDS is a GPU-computing framework that pushes traditional CPU workloads to the GPU. Dask and RAPIDS together allow you to scale to clusters of GPU machines. This talk will help you navigate this exciting new world, and show how easy it is to get your workloads running faster.
There is a live demo of distributed machine learning model training using Dask and RAPIDS across a GPU cluster.
Aaron Richter is a software developer turned data engineer and data scientist. He has pioneered the development and implementation of large-scale data science infrastructure in both business and research environments. Inevitably, he spent a lot of time finding efficient ways to clean data, run pipelines, and tune models. Aaron is currently a Senior Data Scientist at Saturn Cloud, where he works to make data scientists faster and happier. He holds a PhD in machine learning from Florida Atlantic University.