Learn to Use a CUDA GPU to Dramatically Speed Up Code In Python

preview_player
Показать описание
I explain the ending of exponential computing power growth and the rise of application-specific hardware like GPUs and TPUs. Includes a demo of using the Numba library to run code on Nvidia GPUs orders of magnitude faster via just-in-time compilation and CUDA.

00:00 Start of Video
00:16 End of Moore's Law
01: 15 What is a TPU and ASIC
02:25 How a GPU works
03:05 Enabling GPU in Colab Notebook
04:16 Using Python Numba
05:40 Building Mandlebrots with and without GPU and Numba
07:49 CUDA Vectorize Functions
08:27 Copy Data to GPU Memory

📝 Guided Projects:
Рекомендации по теме
Комментарии
Автор

holy shit, i was looking into this to speed up my mandelbrot-zooms and they are what you use as an example! This is a dream come true!

pkeric
Автор

Hello,

Thank you for this great introduction to numba and more specifically numba+cuda.
It is effectively a very easy approach to harness the power of cuda in simple python scripts.

there is a mistake in the "cuda" example.
You are calling the regular "create_fractal" instead of calling the "mandel_kernel" cuda version.
But if you call the cuda version : "mandel_kernel" you also have to precise the size of the grid (be careful x and y are reversed).
Therefore the final version for the call of "cuda" version of Mandelbrot is:

image = np.zeros((1024, 1536), dtype=np.uint8)
start = timer()
mandel_kernel[1536, 1024](-2.0, 1.0, -1.0, 1.0, image, 20)
dt = timer() - start
print("Mandelbrot created in %f s" % dt)
imshow(image)
show()

ChristopheKumsta
Автор

This is amazing! Thank you for taking effort to make it!

somefriday
Автор

Nice demo - I am getting into CUDA -GPU programming and have a workstation build with a 1950x 16 core CPU and two rtx 2080ti gpus and would like to check this demo on the machine and observe the outcome results without using colab- definitely will check this out today. By the way, with notebook python3 environment , I need to use pip to install numba library as shown or do i have to create a new virtal environemnt? I am curious about that. Thank you

gjubbar
Автор

I tried to follow this on my Windows 10 machine. The function you call as at 7:16 is still create_fractal() and not mandel_kernel() so I don't see why it is faster. When I changed it to mandel_kernel(), it complained that I hat to provide a launch configuration, telling the gpu how many grids and blocks to create. I added it like so (First properly setting a grid and block variable): mandel_kernel[grid, block](-2.0, 1.0, -1.0, 1.0, image, 20). It then worked and really was nearly 100x faster than the jit version.

ramoni
Автор

This is very helpful. Most people don't realize the overheads and code refactoring necessary to take advantages of the GPUs. I am going to refactor a simple MNIST training propgram I have which currently uses only Numpy. See if I can get meaningful improvements in training time.

vallurirajesh
Автор

6:41, except the time when you run the function for the first time, as in the rest, it will be fast.

alexzander__
Автор

Can I use this in an app that has a Kivy GUI ?

agnichatian
Автор

A great and unique video. Thanks a lot for sharing.

bernietgn
Автор

Thank-for this. I was able to replicate locally using Jupyter Notebook with Nvidia and WSL2, worked like a charm.

ShaunPrince
Автор

good stuff on here :)
I like how you did the website for documenting the video notes for reference later

chetana
Автор

Is the GPU script correct? No to_device and copy_to_host functions to copy the image to and from the GPU. And the script uses the create_fractal function rather than the mandel_kernel.

QuantumWormhole
Автор

Hi, Can You show the same problem solution in code with cpu device to compare performance cpu vs gpu?

PP-tczp
Автор

can I use numba for training models in sklearn libraries?

yogeshwarshendye
Автор

how can i speed up my machine learning code (sklearn and tensorflow), its very slow, ahhh😡

saebifar
Автор

Can you use this to speed up kmeans?
I have 60 million rows to cluster. On 16 cores it is going for hours.

knowledgelover
Автор

I don't understand example with numpy array sum, why you do this?. I don't need this. Yo can do just sum two the same tables with numpy: df = df2 + df2. Effect is the same without gpu, imediately
So why use gpu for this operation? I don't see any adventage with table example.

PP-tczp
Автор

Is there something other than Cuda that I can use? I don't plan to use any Nvidia GPUs. So, cuda is useless for me. In addition, unless you work in game development or some kind of niche research, work computers will not have an Nvidia-based GPU. I own several computers and none use Nvidia.

ajflink
Автор

sir I am still having some doubts.. can you please share your contact_num/mail_id?

Actually I have downloaded 2 files on github: one is a .cu file and the other is a .sh file.
Now the thing is both the files are interconnected, as like the .cu file takes the input from .sh file. I don't know how to run them or how to upload them.
I request you to please guide me. I will be highly thankful to you. My project review is there.

summercamp
Автор

My head gonna explode from all of theese, but I feel if learn this, I will get powerful....still no idea how to make my program run on GPU even when its HIGLY parael stuff...

jakubkahoun
welcome to shbcf.ru