2678x Faster with CUDA C: Simple Matrix Multiplication on a GPU | Episode 1: Introduction to GPGPU

preview_player
Показать описание
Parallel Matrix Multiplication on a GPU using CUDA C.

00:00 - Introduction
01:00 - Matrix Multiplication
01:52 - Sequential Matrix Multiplication in C
03:23 - Why use a GPU for this problem
04:01 - CPU vs GPU
04:56 - CUDA Programming
07:56 - Matrix Multiplication on a GPU
11:06 - Conclusion
Комментарии
Автор

While clicking at the video, never in a million years could’ve I imagined that you explain all of this stuff in such simple and comprehensive manner. Great Work.

divyamxdeep
Автор

The real magic starts with cache tiling and shared memory optimization. Hope to see this in Episode 2!

ProjectPhysX
Автор

This was an amazing explanation, thanks for sharing.

dan_pal
Автор

Thank you for this video!! Great content and nice animations

illustrationvaz
Автор

Hi, Standard Normal, thanks for the great vid!😊

jakeaustria
Автор

How did you make this animation like 3blue1brown
btw you name 0mean1sigma is quite Standardized.

bilal_ali
Автор

Great introduction. One thing to add, each thread can also contain a small block of output elements rather than a single one.

sehbanomer
Автор

Crisp and clean explanation! I wondered can you do a video on warps, thread tiling, different types of kernel reduction and fusion in a simple application based example ?

plutoz
Автор

I loved this video. I wished it had kept going on

dtamien
Автор

Just two questions:
1- What if you want to use the GPU power and efficiency without rely on CUDA and use a general code to perform operations on a general GPU (AMD users for example)? What code do you have to write?
2- The performance would be the same?

finmat
Автор

In the matrix multiplications used at 2:00, are the numbers of rows and columns in the matrices variable or fixed? If it is variable, in what value range, if it is constant, in what value. Also, how many bit operations do these matrices use?

empatikokumalar