Much Faster Pandas with cuDF GPU Processing - CPU vs GPU Speed Benchmarks

preview_player
Показать описание
What if I told you that all this time we've been using Pandas wrong? 🐼 🐼 🐼
We keep running it on our CPU and wondering why it's slow - but what happens when we switch to GPU processing? 🤔
In this tutorial we will explore the brand new technology behind cuDF Pandas Accelerator Mode that allows us to use our graphic cards to make Pandas MUCH faster! We will:

1️⃣ 1. Install RAPIDS cuDF via Windows Subsystem for Linux @ 01:58 - 05:03
2️⃣ 2. Code a Data Science Workflow with cuDF Pandas & Sentiment140 @ 05:03 - 11:28
3️⃣ 3. Learn Basic Feature Engineering with Regex @ 11:28 - 15:12
4️⃣ 4. Conduct a CPU Pandas versus GPU cuDF Pandas Speed Test @ 15:12 - 19:08

Throughout the tutorial we will talk about the relationship between processors, memory and graphics cards with lots of colorful visualizations and examples. We will see how our software processes manifest on the hardware level and explore the effects of GPU parallel programming in the realm of sentiment analysis.

⭐ More cuDF Pandas Resources ⭐
---------------------------------------------------------------
📝 Official cuDF Pandas Colab Notebook Code Example (beginner friendly):
💻 cuDF Pandas Virtual Summit Sessions:

🎥 Related Videos of Mine 🎥
---------------------------------------------------------------
⭐ CUDA Simply Explained - GPU vs CPU Parallel Computing:
⭐ Basic Guide to Pandas - Tricks, Shortcuts, Must Know Commands:
⭐ FASTER Inference with Torch TensorRT - CPU vs CUDA:
⭐ Anaconda Beginners Guide for Linux and Windows:

💻 Installation and Download Links 💻
---------------------------------------------------------------
⭐ cuDF Installation Guide @ 2:59:
⭐ Sentiment140 Homepage @ 5:50:

Once the homepage is back - you'll be able to copy the link from their student section.

⏰ TIMESTAMPS ⏰
---------------------------------------------------------------
00:00 - 00:43 | intro
00:43 - 01:10 | what is CUDA?
01:10 - 02:25 | install WSL (Windows Subsystem for Linux)
02:25 - 03:42 | install Anaconda in WSL
03:42 - 05:03 | Install RAPIDS cuDF
05:03 - 05:30 | what is Sentiment Analysis?
05:30 - 07:12 | download and unzip Sentiment140 with code
07:12 - 07:51 | import cuDF Pandas Accelerator Mode
07:51 - 09:46 | loading and processing operations on GPU
09:46 - 10:24 | GPU and CPU commands profiling
10:24 - 10:40 | why do we need CPUs if GPUs runs faster?
10:40 - 11:28 | cuDF CPU fallback
11:28 - 13:27 | feature extraction with Regex patterns (Regular Expressions)
13:27 - 15:11 | feature reduction
15:11 - 19:09 | CPU Pandas versus GPU cuDF Pandas Speed Test
19:09 - 19:35 | Challenge
19:35 - 19:57 | Thanks for watching!

🤝 Connect with me 🤝
----------------------------------------------------------------
🔗 Github:
🔗 Discord:
🔗 LinkedIn:
🔗 Twitter:
🔗 Blog:

💳 Credits 💳
----------------------------------------------------------------
⭐ Beautiful titles, transitions, sound FX:
⭐ Beautiful icons:
⭐ Beautiful graphics:

#python #pythonprogramming #machinelearning #pandas #pythonpandas #pythonpattern #pattern #regex #regularexpression #gpu #cpu #processor #graphicscard #graphiccard #hardware #computerhardware #encoding #benchmark #benchmarks #cuda #rapids #nvidia #artificialintelligence #datascience #programming #coding #neuralnetworks #ml #ai #technology #computer #computerscience #data #dataanalytics #datastructures #gpucomputing #multiprocessing #rtx #rtx4080 #jupyterlab #sentimentanalysis #featureengineering #database #datasets #twitter
Рекомендации по теме
Комментарии
Автор

hi, can you make a short video on how to use cudf in python scripts? there are some solutions on internet which don't work, it would be really good if you do this video

theMintyRaven
Автор

CPU R9 5900x - 1min 3 sec,
GPU RTX 3090 - 1.6 sec.

Furthermore, Pandas is single-threaded by default, but there is a "modin" project/library that allows to scale pandas so that it utilizes all CPU cores and threads, which helps especially well on large data sets and when a CUDA device is not available. To accomplish this, simply replace the import statement with "import modin.pandas as pd" in the script (previously installed the library, of course). For the task in the video, this reduced the execution time by more than half comparing to ordinary pandas using.

Thank you for the video!

oleg.mammoth
Автор

It's amazing, I started to work on it more than a decade ago. But now, everyone can do it in seconds. Thank you.

kamertonaudiophileplayer
Автор

Excellent tutorial as always!
My Alienware R11 has an i9 10th gen and only 16Gb RAM and a 1660ti. With regular pandas the kernel crashed on a couple of tries because it couldn't handle that amount of data (I think) but with cudf it did the labelling task on the 25 mil rows in 1min 35s.

slademeister
Автор

i love your way of teaching, this video came just in time

tanishaness
Автор

amazing! always knew GPU would speed it up and you proved it.. with cuDF

JamesLee-lqqb
Автор

You´re always amazing M, great video!!! 🤘😸

soultribe
Автор

3:18 "We'll of course carefully read the license agreement" 😆

grasshopper
Автор

Hi Maria, Love the way you teach and I can understand everything you teach.

anilkrishna
Автор

Love the idea of moving processing onto the GPU!

Great video! ❤

scrumtuous
Автор

For me CPU : 9.64 s and GPU: 798ms
Thank you so much. I love your videos.

akashmahmud
Автор

Hi Maria.
Have you tried the Polars library? It is inspired by Pandas and works similarly but much faster because it is made by using Rust programming language.
Greetings from Ukraine.

vasylpavuk
Автор

I love you explanation i love how you simplify things

AlexShoyhit
Автор

this tutorial is awesome, thanks for the guide

kehaujung
Автор

A geek beautiful woman talking about very interesting topics. Like! 😀

deAraujoAndre
Автор

Dude, you finally discovered cuDF <3

renanmonteirobarbosa
Автор

My WSL is set up as Debian, and I have Python 3.11 installed on it, so I wasn't able to replicate this. Apparently 3.11 isn't supported yet 😢.
I will try some other time to set up a second WSL with Ubuntu and see what happens.
I have a Threadripper Pro 3945WX with 256GB RAM, and an RTX 3080Ti, and generally speaking, CUDA is about 4x faster than CPU. Only having 12GB of VRAM is a major problem though. Typically the datasets I work with are around 100m rows.

I reduced the exectution time of one Pandas project from 10 minutes to 0.1 seconds by re-coding it in Numpy, and Polars gives me not quite so good speed increases for a lot less effort.

katrinabryce
Автор

are you planning a set up tutorial of cudf for linux? been looking all over for one using cuda 12.3…

Ggorre-kzmd
Автор

Thank you. It's very helpful to me.:)

starlightknights
Автор

Love you maria my best python teacher,

shakils