2678x Faster with CUDA C: Simple Matrix Multiplication on a GPU | Episode 1: Introduction to GPGPU

Показать описание

Parallel Matrix Multiplication on a GPU using CUDA C.

00:00 - Introduction
01:00 - Matrix Multiplication
01:52 - Sequential Matrix Multiplication in C
03:23 - Why use a GPU for this problem
04:01 - CPU vs GPU
04:56 - CUDA Programming
07:56 - Matrix Multiplication on a GPU
11:06 - Conclusion

0Mean1Sigma

Комментарии

While clicking at the video, never in a million years could’ve I imagined that you explain all of this stuff in such simple and comprehensive manner. Great Work.

divyamxdeep

The real magic starts with cache tiling and shared memory optimization. Hope to see this in Episode 2!

ProjectPhysX

This was an amazing explanation, thanks for sharing.

dan_pal

Thank you for this video!! Great content and nice animations

illustrationvaz

Hi, Standard Normal, thanks for the great vid!😊

jakeaustria

How did you make this animation like 3blue1brown
btw you name 0mean1sigma is quite Standardized.

bilal_ali

Great introduction. One thing to add, each thread can also contain a small block of output elements rather than a single one.

sehbanomer

Crisp and clean explanation! I wondered can you do a video on warps, thread tiling, different types of kernel reduction and fusion in a simple application based example ?

plutoz

I loved this video. I wished it had kept going on

dtamien

Just two questions:
1- What if you want to use the GPU power and efficiency without rely on CUDA and use a general code to perform operations on a general GPU (AMD users for example)? What code do you have to write?
2- The performance would be the same?

finmat

In the matrix multiplications used at 2:00, are the numbers of rows and columns in the matrices variable or fixed? If it is variable, in what value range, if it is constant, in what value. Also, how many bit operations do these matrices use?

empatikokumalar