Striding CUDA like i'm Johnnie Walker

preview_player
Показать описание
💡 Giveaway steps:
✅ 2. Wait for #GTC23 to start and join the Keynote livestream.
✅ 3. Attend GTC sessions (there’s really a lot of sessions going on - just pick one you’re interested in) 😄

⏱Outline:
00:00 Intro
01:45 4080 RTX Giveaway steps
02:42 Jupyter notebook preperations
02:47 GPU in use
03:27 Simple array declaration
03:43 numba's vectorize decorator
03:56 CUDA kernel
05:12 Grids and blocks
05:57 Caveat
06:27 Striding kernels
06:58 Example
07:32 Example: 1 block of 4 threads
07:39 Elements processed by each thread
08:02 Performance analysis
10:18 Outro

📚 CUDA things to know:
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA for accelerating the computation tasks of graphics processing units (GPUs). CUDA programming is a way of programming GPUs using the CUDA programming language, which is based on the C and C++ programming languages. CUDA allows developers to offload computationally intensive tasks to the GPU, which can perform calculations in parallel with much greater efficiency than traditional CPUs. This makes it a popular choice for high-performance computing applications in a wide range of fields, including scientific computing, machine learning, computer vision, and more. To write programs using CUDA, developers use special extensions to the C and C++ programming languages that enable them to express parallelism and manage data transfers between the CPU and GPU. These extensions include special keywords and functions that allow developers to launch kernels (small, self-contained units of code that execute on the GPU) and manage memory allocation and data transfers between the CPU and GPU. In summary, CUDA programming provides a powerful way to harness the parallel computing capabilities of GPUs for high-performance computing tasks.

Now when I say “CUDA Python”, this refers to the use of the CUDA parallel computing platform and programming model with the Python programming language. This allows Python developers to harness the power of GPUs for high-performance computing tasks, without needing to learn a new programming language. CUDA is typically used with the Numba library, which allows Python functions to be compiled to CUDA kernels. Numba provides decorators that can be used to specify which functions should be compiled to run on the GPU, and which arguments should be transferred between the CPU and GPU.
With CUDA Python and Numba, Python developers can write high-performance code that can take advantage of the massively parallel nature of GPUs. This makes it a popular choice for scientific computing and data analysis, where performance is critical for working with large datasets or running complex simulations. It's important to note that while CUDA can provide significant performance gains, it's not always the best choice for all tasks. It's important to consider factors such as the size of the problem, the available hardware, and the specific requirements of the application before deciding whether to use CUDA Python.

CUDA is fast because it offloads computation-intensive tasks to the highly parallel architecture of modern graphics processing units (GPUs). GPUs were originally designed to render complex 3D graphics, but they are also highly effective at performing parallel computations. A typical CPU has a small number of cores optimized for serial processing, whereas a GPU has thousands of smaller, more power-efficient cores that can perform many calculations simultaneously. This allows a GPU to perform many more computations per second than a CPU, especially for tasks that can be highly parallelized, such as image and video processing, machine learning, and scientific simulations. CUDA is a parallel computing platform and programming model developed by NVIDIA that allows developers to write code in a way that can be executed on the GPU. This means that developers can take advantage of the GPU's parallel processing capabilities to accelerate their computations. By utilizing CUDA, applications can see significant speedups, sometimes by orders of magnitude, compared to traditional CPU-based computations. In summary, CUDA is fast because it leverages the parallel architecture of GPUs, which can perform many computations simultaneously, resulting in significant speedups for a variety of computational tasks.

🙏🏻 Credits:
Dan G. for directing
Moe D. for editing
Samer S. for brainstorming
Bensound for audio

This video is created under a creative common's license.

#gtc23 #cuda #stride
Рекомендации по теме
Комментарии
Автор

I just found this channel. I think it's amazing and it is everything someone that wants to learn the basics ever needs. I am a true believer that the most important thing is to get a grasp of the intuition and then slowly try to dive deeper into any topic. And of course you have A LOT of question when trying to learn something new and I love the way you approached it from the newbie's point of view, and focusing on what needs to be cleared first. Unfortunately, sometimes it's really hard to find any tutorials like that. Especially that at uni, it doesn't work like that at all :))) So I'm glad I've found you, and hope you keep posting.

wrestlerbrothers
Автор

11:00 YOU ARE WATCHING A MASTER AT WORK 😍😍😍 !!!

efehanpalta
Автор

amazing and it is everything someone that wants to learn the basics ever needs. I am a true believer that the most important thing is to get a grasp of the intuition and then slowly try to dive deeper into any topic

RogerJohnson-cv
Автор

This guy is like the HITMAN of NVIDIA, he just simply murders all its competitors 🥶

emirtanseldeveci
Автор

I feel like Cuda has been demystified. Very glad I found your series.

kartal
Автор

It's very informative and a good intro to CUDA programming. Thanks very much!

Berrats
Автор

nice tutorial! excellent video and clear window-in-window effects to show what you are doing!!

miryusifhuseynov
Автор

I absolutely agree about the cooling! it's a key component, especially in overclocked systems. a good air circulation will ensure your hardware lasts for much longer!

thedilan
Автор

Anyhow, thank you for your comment! I'll definitley talk about it in due time 😉 ...it's just too soon, it's only the first episode of this parallel computing series

recepduman
Автор

Question: In another video, someone said the way the GPUs implement control flow, is to push a mask on the other options, execute the true, then execute the false, etc. So wouldn't this be slower using sync threads? if you just use a cascading set of ifs, then the race condition would be solved and you wouldn't need the syncs right? Not sure if the mask thing is still, or was, true. Thanks.

sukrantektas
Автор

The software works with CUDA. I think their competitor that makes CST also uses video card memory for ultra fast calculations of insanely large matrices (inverting the matrix) .

losko
Автор

This is the best introduction to CUDA I've seen, thanks a lot !

xwey_edits
Автор

wold love to see a video on what are a few CUDA programming challenges

alyaselnuaymi
Автор

I have been looking into gpu programming using numba and python for a while, this seems to be the best tutorial I was able to find so far.. . thank you

asmarmurad
Автор

now that I again watched it... I have no words to say more than FANTASTIC .. clarity, knowledge, and everything else ..

fxriyt
Автор

I use CUDA with HFSS - a numerically intensive electromagnetic solver (solves/satisfies Maxwell’s equations in 3D space).

_chefrlk
Автор

I usually start with a C++ version. It's a good starting point and it gives a benchmark to beat. I code up a basic port to CUDA and profile it with NVVP to see it's got any helpful suggestions.

lyme
Автор

This had LTT / LMG levels of production value with one of the best / clearest explanations for what CUDA is and why it matters.

mohamedkhalil
Автор

Super clear explanation Ahmad, great video, thank you!

RiamuYumemiNM
Автор

Padding in Magical. Awesome explanation!!!

bulletsndcarlovers