Все публикации

Lecture 34: Low Bit Triton Kernels

Lecture 33: Bitblas

Lecture 32: Unsloth

Lecture 31: Beginners Guide to Metal

The History of CUDA MODE (Now GPU MODE)

Lecture 30: Quantized Training

GPU MODE IRL 2024 Keynotes

Lecture 29: Triton Internals

Lecture 28: Liger Kernel - Efficient Triton Kernels for LLM Training

Lecture 27: gpu.cpp - Portable GPU compute using WebGPU

Lecture 26: SYCL Mode (Intel GPU)

Lecture 25: Speaking Composable Kernel (CK)

Lecture 24: Scan at the Speed of Light

Lecture 23: Tensor Cores

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

Lecture 21: Scan Algorithm Part 2

Lecture 20: Scan Algorithm

Lecture 19: Data Processing on GPUs

Lecture 18: Fusing Kernels

Lecture 17: NCCL

Lecture 16: On Hands Profiling

Bonus Lecture: CUDA C++ llm.cpp

Lecture 15: CUTLASS

Lecture 14: Practitioners Guide to Triton