Pandas Dataframes on your GPU w/ CuDF

Показать описание

An overview and some quick examples of using CuDF's Pandas accelerator and how much faster it can be than vanilla Pandas for data analysis.

Рекомендации по теме

Комментарии

Enabling cuDF using a single flag is insane! However, I just wannted to point out (especially for new pandas users) that the proper way to calculate average price per city in pandas is by using groupby. Running in plain pandas is blazing fast (a few ms), nothing compared to 19 minutes. That doesn't mean that cuDF is not useful, but don't forget that using plain pandas properly can get you a long way.

HarrisBallis

Hey Sentdex, can you take this video down so my manager doesn't find out that I sped up the entire codebase by 200 fold with just one line and I end up getting appreciation bonuses??

Jokes aside, this is absolutely wild. What a gamechanger. Thanks a lot as always, Kevin!

bantaibaman

Awesome video! I encountered a similar issue where I had to process ~8 GB of data using an AWS Lambda (limited RAM and time). I used polars (pandas alternative written in rust from scratch for performance) and I found it to be blazing fast . It's really really useful - especially with non nvidia devices like my raspberry pi and the AWS lambda function. You should definitely check it out!

harshvaragiya

Wonderful! Thank you. It would be an interesting comparison with polars library as well.

perryholman

Once again thank you for sharing :-) You are appreciated.

kenchang

Missing your tutorials man, trying to install this on windows...

Brickkzz

For the read_csv operation I would be curious what is actually taking the most amount of time with the Pandas object, I suspect it's building the Python string objects, and if so I wonder if you have PyArrow installed and set = True it would be much faster?

And in general it makes sense that using strings are slow in Pandas, because it's falling back to looking up a Python object by via reference to it, it's actually a much more interesting comparison for number or datetime data types. For strings it would be much more interesting if you had use the PyArrow string data type.

damianshaw

Thanks for sharing... I would be curious about a comparison between accelerated version of pandas and polars.

__python__

Outstanding. Thank you for this informatoin.

jameslucas

Thanks a lot for sharing. super useful <3

usamatahir

Hello Sentdex, I am reaching out to you regarding your Neural network from scratch series ?
any updates on that, you left on pt 9
Please do continue its an awesome series

and any updated on Book discounts for use for the Black friday ??

please do help

noormohammedshikalgar

the kubota warrior is back with the heat 🗣🗣🗣

acelaox

3:38 it doesn't have the prices "in quotes like a string", it's a properly exported csv that has ALL fields quoted. Your pd.read_csv is missing quoting=csv.QUOTE_ALL (or just quoting=1) and optionally quotechar='\"' . The only "magic" pandas is doing is interpreting that column as quoted. If you add those options, I'm guessing cudf will run just as well, since the ingest portion will still be using python standard lib or at least pandas C implementations.

Mil-Keeway

Great video. Just one thing: instead of comparing cuDF with vanilla Pandas, wouldn’t a comparison with Modin be a more appropriate one?

thetdg

Please post more often videos Harrison

oguzhanyldrm

What if my RAM (128GB) is larger than my VRAM (32GB)? Is normal pandas still faster for data that's larger than the VRAM?

maurice

@sentdex sir please make videos on 3d deep learning, its really exciting to see your work on point cloud

shashisaini

how about Mojo? Mojo can actually use GPU to accelerate calculation too, currently Mojo support numpy，pandas in cpu. It will be fun to make a comparison with CuDF. Mojo is more like a superset for python.

BohonChina

What's the reasoning for not using groupby in this demo? Wouldn't that be the more natural and faster pandas method to use - instead of looping over everything.

Feels a little disingenuous to compare poorly optimised pandas code that no one would actually write.

EarlZMoade

jesus christ my life has totally changed

AlignmentLabAI

Pandas Dataframes on your GPU w/ CuDF

Pandas Dataframes on your GPU w/ CuDF

Make Your Pandas Code Lightning Fast

An Introduction to GPU DataFrames for pandas users

Nvidia Rapids Dask CUDA Dataframes - 22x faster than Pandas!

Faster Data Manipulation using cuDF: RAPIDS GPU-Accelerated Dataframe

High Speed Pandas on an GPU in AWS Cloud with RAPIDS and NVIDIA NGC

'cuDF: RAPIDS GPU-Accelerated Dataframe Library' - Mark Harris (PyCon AU 2019)

Can Pandas Keep Up? Testing Pandas GPU & Polars for Data Analysis & Processing | cuDF Pandas...

Supercharging Analytics with GPUs: OmniSci/cuDF vs Postgres/Pandas/PDAL - Masood Krohy

Accelerate Pandas DataFrame Processing with One Line of Code | Intel Software

Live- Exploring Nvidia RAPIDS- Open GPU Data Science

Process HUGE Data Sets in Pandas

GPU-Accelerated Data Analytics in Python |SciPy 2020| Joe Eaton

Python vs C++ Speed Comparison

GPU Dataframe Library RAPIDS cuDF | Scalable Pandas Meetup 5

Benchmarking Polars vs Python on Big Data 2 billion rows

Pandas #5 : Multi-Index Dataframe in Pandas || Python Pandas Tutorial for Data Science

Ashwin Srinath - cudf.pandas: The Zero Code Change GPU Accelerator for Pandas | PyData Global 2023

4. Introduction to cuDF

Getting Started with cuDF - GPU DataFrame Library (RAPIDS)

Pandas to Spark DataFrame in 15sec #shorts #pandas

Pandas In Accelerated Mode-Use Pandas With GPU With Nvidia Rapids Cudf Library

Marco Gorelli - Understanding Polars Expressions when you're used to pandas | PyData Amsterdam ...

GPU-accelerated SQL and Data Science - Rodrigo Aramburu