Writing Code That Runs FAST on a GPU

preview_player
Показать описание
In this video, we talk about how why GPU's are better suited for parallelized tasks. We go into how a GPU is better than a CPU at certain tasks. Finally, we setup the NVIDIA CUDA programming packages to use the CUDA API in Visual Studio.

GPUs are a great platform to executed code that can take advantage of hyper parallelization. For example, in this video we show the difference between adding vectors on a CPU versus adding vectors on a GPU. By taking advantage of the CUDA parallelization framework, we can do mass addition in parallel.

Рекомендации по теме
Комментарии
Автор

You could make a series out of this - basics of CUDA are trivial, but there are many, many performance traps in gpgpu

empireempire
Автор

Excellent tutorial. One minor thing I would have mentioned in your video is that copying between device and host or host and device is a relatively expensive operation since you are moving data from/to the CPU to/from the GPU through the pci express bus which no matter how fast or modern system you have is still a bottleneck compared to data transfer between CPU and memory or GPU and dram. So the performance advantage is only noticeable when the duration of data copying is relatively short compared to the task execution time.

peterbulyaki
Автор

This was super insightful, never would have thought it'd be that easy... I need to look more into cuda programming now

shanebenning
Автор

Finnaly i can use my rtx 3060 Ti to do something useful...

dominikkruk
Автор

This fight at @7:30 with "*" placement was hilarious. I laughed so hard when you gave up :)

Borszczuk
Автор

Amazing video! I love the way you explain things thoroughly enough that a beginner can easily understand it without explaining *too* much and droning on. Thorough yet concise, great job :)

rezq
Автор

Amazing intro to CUDA man! For those interested in gpu programming, I'd also recommend learning openACC. Not as powerful as CUDA, but gives you a nice "first working" gpu program to have an idea before suffering with low level optimization hehe. Would be nice to see a follow up to this using both MPI and CUDA to work with multiple GPUs :D

lucasgasparino
Автор

i discovered your channel recently and so far I am loving it.

herrxerex
Автор

This is a lot more straightforward than I thought it would be. Basically, replace all allocation operations and pointer operations with CUDA framework types and functions. 😅

xggbrnr
Автор

You channel is amazing! Just found it and I must tell you have a great way of teaching. Kudos for that congrats on the amazing content

Shamysoza
Автор

This was a super cool video. I'm currently learning assembly so seeing how to operate at a pretty low level was very interesting to me.

mrmplatt
Автор

You explained it so well, thanks a lot

psevekar
Автор

Great video! Short and to the point, just enought to get me started!

thorasmund
Автор

Very nice tutorial. I really liked it. It's brief, to the point and very clear. Thanks. Could you please make a video for the same example but in Linux?

ramezanifard
Автор

Hey this is super useful! I elected High Performance Computing and Microprocessors and Embedded Systems modules for my degree, and this channel has become my go-to guide.

rampage_sl
Автор

Good video. It would be interesting to make the vectors huge and run some benchmarks comparing the cuda function to the cpu version.

and
Автор

I'd really love to see more videos like these

Frost_Byte_Tech
Автор

Thanks a ton, very clear explaination 🙏

iyadahmed
Автор

Loved the video! Had to like and subscribe! Can't wait to see the rest of the project as well as what other projects you work on!

Rottingflare
Автор

Useful, but the discussion about the block size and grid size was avoided. I think there should be a video focused only on this topic as it's not easy to digest, especially for new CUDA programmers. A comparison with OpenCL would be even better :)

bogdandumitrescu