Must Know Technique in GPU Computing | Episode 4: Tiled Matrix Multiplication in CUDA C

preview_player
Показать описание
Tiled (general) Matrix Multiplication from scratch in CUDA C.

00:00 Introduction
00:41 Standard Matrix Multiplication
01:41 Tiled Matrix Multiplication Algorithm
03:24 Tiled Matrix Multiplication Code
05:53 General (Tiled) Matrix Multiplication
08:11 Demo
08:26 Next Video: Tensor Cores!
Рекомендации по теме
Комментарии
Автор

This is the best material so far.

All the other videos failed to explain the concept of "PHASE". in each phase, each tile, which has the same dimension as block size, get transferred two copies, subA and subB, from A and B. this step caused extra time, but the subsequent calculation can take advantage of shared memory.

Looking forward to your future videos!

zijiali
Автор

This is awesome... Looking for such fascinating stuffs for my 12 yo kid

prab
Автор

What a well made video! Having both a code example along with a visual representation is awfully pleasant and I can't believe more aren't doing the same.

Combine that with the fact your pacing, pronouncing, clarity and presentation are all great, all while feeling genuine and expressive.

I love to see this kind of thing. Just saying this to hopefully encourage you to make more!
Looking forward to see what else you do in the future!

fractergiftogod
Автор

This was what I was waiting for!!! Thank you as always😊

Coolmd-itck
Автор

One question I would have is why in code int i = blockDim.y * by + ty and not blockDim.x * bx + tx? why this change?

siddharth-gandhi