Analysis of a Tensor Core

preview_player
Показать описание
A video analyzing the architectural makeup of an Nvidia Volta Tensor Core.

References:

Tensor Core overview:

Tensor Core + Volta Architecture Whitepaper:

CUDA programing details for Tensor Cores:

I am not affiliated with any of the companies mentioned in the video. This video is intended for educational purposes.
Рекомендации по теме
Комментарии
Автор

What a great shop and tour! I LOVE the detail and the thought process of creating something that will last for many, many decades.

jackiegammon
Автор

It is also interesting to know about a new schedulers used for dispatching ray-tracing routines (e.g. closest/anyhit, that is
dynamically scheduled). Are they accessible directly (or at least indirectly) from CUDA cores (C++).

AnatoliyRU
Автор

Thanks for explanation is there a source that you can recommend about warp scheduling and SM's ?

cem_kaya
Автор

Fantastic overview. Any chance of a follow up with some CUDA C samples?

dennisrkb
Автор

Funny to think how we see tessellation as triangles when it’s a triangle representing a pyramid, representing points.

jaxx
Автор

Systolic Array multiplier like tpu's Mxu unit

bhuvaneshs.k
Автор

Typo: should be ...+ A[0, 3]*B[3, 0]... at 1:32

pavlo
Автор

so why dont you just say "matrix operation core" or matrix multiplication core, why would make things complicated with complex differing terminology, "tensor"

gsestream
Автор

Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers
bfloat16 MAC, one addition and one multiplication per clock : ~100 LUTs + 1 DSP48E2 @ > 600 MHz result accumulated in > 256 bits
Tensor core needs 64 of these => ~ 6, 400 LUTs + 64 DSP48E2

blmac
Автор

I commented on another video about it sounding like a computer speaking. This video sounds like a human, but the mic quality is much lower.

Saturn
Автор

Tensors and Matrices are not the same mathematical objects. There is some confusion in there

pperez
Автор

For normies Tensor Core = DLSS + raid tracing = BETTER GAMING
For machine Learning = Tensor Core = Better And Faster output
I see two different world....

cun_