Striding CUDA like i'm Johnnie Walker

Показать описание

💡 Giveaway steps:
✅ 2. Wait for #GTC23 to start and join the Keynote livestream.
✅ 3. Attend GTC sessions (there’s really a lot of sessions going on - just pick one you’re interested in) 😄

⏱Outline:
00:00 Intro
01:45 4080 RTX Giveaway steps
02:42 Jupyter notebook preperations
02:47 GPU in use
03:27 Simple array declaration
03:43 numba's vectorize decorator
03:56 CUDA kernel
05:12 Grids and blocks
05:57 Caveat
06:27 Striding kernels
06:58 Example
07:32 Example: 1 block of 4 threads
07:39 Elements processed by each thread
08:02 Performance analysis
10:18 Outro

📚 CUDA things to know:
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA for accelerating the computation tasks of graphics processing units (GPUs). CUDA programming is a way of programming GPUs using the CUDA programming language, which is based on the C and C++ programming languages. CUDA allows developers to offload computationally intensive tasks to the GPU, which can perform calculations in parallel with much greater efficiency than traditional CPUs. This makes it a popular choice for high-performance computing applications in a wide range of fields, including scientific computing, machine learning, computer vision, and more. To write programs using CUDA, developers use special extensions to the C and C++ programming languages that enable them to express parallelism and manage data transfers between the CPU and GPU. These extensions include special keywords and functions that allow developers to launch kernels (small, self-contained units of code that execute on the GPU) and manage memory allocation and data transfers between the CPU and GPU. In summary, CUDA programming provides a powerful way to harness the parallel computing capabilities of GPUs for high-performance computing tasks.

Now when I say “CUDA Python”, this refers to the use of the CUDA parallel computing platform and programming model with the Python programming language. This allows Python developers to harness the power of GPUs for high-performance computing tasks, without needing to learn a new programming language. CUDA is typically used with the Numba library, which allows Python functions to be compiled to CUDA kernels. Numba provides decorators that can be used to specify which functions should be compiled to run on the GPU, and which arguments should be transferred between the CPU and GPU.
With CUDA Python and Numba, Python developers can write high-performance code that can take advantage of the massively parallel nature of GPUs. This makes it a popular choice for scientific computing and data analysis, where performance is critical for working with large datasets or running complex simulations. It's important to note that while CUDA can provide significant performance gains, it's not always the best choice for all tasks. It's important to consider factors such as the size of the problem, the available hardware, and the specific requirements of the application before deciding whether to use CUDA Python.

CUDA is fast because it offloads computation-intensive tasks to the highly parallel architecture of modern graphics processing units (GPUs). GPUs were originally designed to render complex 3D graphics, but they are also highly effective at performing parallel computations. A typical CPU has a small number of cores optimized for serial processing, whereas a GPU has thousands of smaller, more power-efficient cores that can perform many calculations simultaneously. This allows a GPU to perform many more computations per second than a CPU, especially for tasks that can be highly parallelized, such as image and video processing, machine learning, and scientific simulations. CUDA is a parallel computing platform and programming model developed by NVIDIA that allows developers to write code in a way that can be executed on the GPU. This means that developers can take advantage of the GPU's parallel processing capabilities to accelerate their computations. By utilizing CUDA, applications can see significant speedups, sometimes by orders of magnitude, compared to traditional CPU-based computations. In summary, CUDA is fast because it leverages the parallel architecture of GPUs, which can perform many computations simultaneously, resulting in significant speedups for a variety of computational tasks.

🙏🏻 Credits:
Dan G. for directing
Moe D. for editing
Samer S. for brainstorming
Bensound for audio

This video is created under a creative common's license.

#gtc23 #cuda #stride

Рекомендации по теме

Комментарии

I just found this channel. I think it's amazing and it is everything someone that wants to learn the basics ever needs. I am a true believer that the most important thing is to get a grasp of the intuition and then slowly try to dive deeper into any topic. And of course you have A LOT of question when trying to learn something new and I love the way you approached it from the newbie's point of view, and focusing on what needs to be cleared first. Unfortunately, sometimes it's really hard to find any tutorials like that. Especially that at uni, it doesn't work like that at all :))) So I'm glad I've found you, and hope you keep posting.

wrestlerbrothers

11:00 YOU ARE WATCHING A MASTER AT WORK 😍😍😍 !!!

efehanpalta

amazing and it is everything someone that wants to learn the basics ever needs. I am a true believer that the most important thing is to get a grasp of the intuition and then slowly try to dive deeper into any topic

RogerJohnson-cv

This guy is like the HITMAN of NVIDIA, he just simply murders all its competitors 🥶

emirtanseldeveci

I feel like Cuda has been demystified. Very glad I found your series.

kartal

It's very informative and a good intro to CUDA programming. Thanks very much!

Berrats

nice tutorial! excellent video and clear window-in-window effects to show what you are doing!!

miryusifhuseynov

I absolutely agree about the cooling! it's a key component, especially in overclocked systems. a good air circulation will ensure your hardware lasts for much longer!

thedilan

Anyhow, thank you for your comment! I'll definitley talk about it in due time 😉 ...it's just too soon, it's only the first episode of this parallel computing series

recepduman

Question: In another video, someone said the way the GPUs implement control flow, is to push a mask on the other options, execute the true, then execute the false, etc. So wouldn't this be slower using sync threads? if you just use a cascading set of ifs, then the race condition would be solved and you wouldn't need the syncs right? Not sure if the mask thing is still, or was, true. Thanks.

sukrantektas

The software works with CUDA. I think their competitor that makes CST also uses video card memory for ultra fast calculations of insanely large matrices (inverting the matrix) .

losko

This is the best introduction to CUDA I've seen, thanks a lot !

xwey_edits

wold love to see a video on what are a few CUDA programming challenges

alyaselnuaymi

I have been looking into gpu programming using numba and python for a while, this seems to be the best tutorial I was able to find so far.. . thank you

asmarmurad

now that I again watched it... I have no words to say more than FANTASTIC .. clarity, knowledge, and everything else ..

fxriyt

I use CUDA with HFSS - a numerically intensive electromagnetic solver (solves/satisfies Maxwell’s equations in 3D space).

_chefrlk

I usually start with a C++ version. It's a good starting point and it gives a benchmark to beat. I code up a basic port to CUDA and profile it with NVVP to see it's got any helpful suggestions.

lyme

This had LTT / LMG levels of production value with one of the best / clearest explanations for what CUDA is and why it matters.

mohamedkhalil

Super clear explanation Ahmad, great video, thank you!

RiamuYumemiNM

Padding in Magical. Awesome explanation!!!

bulletsndcarlovers

Striding CUDA like i'm Johnnie Walker

Striding CUDA like i'm Johnnie Walker

Array : CUDA grid stride loops over 2D arrays

tcnn | 10 times more performant than TensorFlow #Shorts

Writing CUDA kernels in Python with Numba

Introduction To CUDA Programming

Dodge EV LOUD!?!

CUDA Programming on Python

Generative AI Unleashed: crafting voices and music from text & images

Animals like us : The animal language | Documentary

Connecting a GPU to Google Colab - Tips and Tricks | Python

Finding Fast Radio Bursts, Faster | Kiran Shila | JuliaCon 2022

KIDS Incorporated | Dead Giveaway (1080p w/Rare Stereo Mix)

Irwin Zaid - DyND: Enabling complex analytics across the language barrier

Better Growth Stock: Microsoft Stock vs Nvidia Stock

Autorama (Part 1 of 4)

SENSORSHIP: NETWORK SAYS 'LOSE THE FLAG' WORMAN SAYS 'NO WAY'!

Ty druha we mnie masz (You've Got a Friend in Me)

Hashrocket: Show & Tell - 11.13.2020

free fire 🔥🔥//pkg gaming

Bye bye Asus G18, Hello Gigabyte Aorus 17H! Also, Atari ST tinkers, Street Fighter 6 lament, and ...

Guadalcanal Showdown: How American Troops Defied the Odds in WWII

(LIVE) How FAST is the RTX 3060...!?

Allison Collins and Erdem Tasdelen: Conversation About Art and Unrequited Love

my kpop photocard collection 2020