Lecture 1 How to profile CUDA kernels in PyTorch

Показать описание

GPU MODE

Рекомендации по теме

Комментарии

'i believe thing i see'

I'm in the right place. Thanks!!

mlock

Thanks for this course. It's very useful to me and my team. Thanks

burnessduan

at 30:40 where you change the BLOCK_SIZE to 1024. How is it possible to reach 8000GB/s when max memory bandwidth of A10G is only 600GB/s? I think setting BLOCK_SIZE = 1024 makes triton compute only the first 1024 columns of the matrix while ignoring the rest, so when you computing the GB/s, the "seconds" part is fixed, while the "GB" grows linearly (128 * i), that's why you're seeing the perf growing linearly. Also the reason why the little `torch.allclose` test didn't complain, is that you are only testing a small matrix (1823, 781) here, whose n_cols <= 1024.

loabrasumente

Nice walk-through mark!

So in practice on a high level one would profile the code, identify the perf bottlenecks and then replace some of the functions associated with that bottleneck with a direct CUDA/Triton implementation?

TheAIEpiphany

Do you have any suggestions for comprehensive resources or study materials that can help a beginner learn about CPUs and GPUs, particularly focusing on their roles and functions in Machine Learning and Deep Learning? I'm looking for in-depth yet accessible information to build a strong foundation in this area, which will enable me to understand the technical aspects discussed in certain videos related to ML/DL, especially this one :).

elliot

oh no. now I have no excuse to be a productive member in my village. Oh, I accidentally subscribed, the terror.

zerotwo

I don’t have a GPU at home. Where can I find best access to GPU with access to NCU nccompute ? Getting the environment seems key

vivekkaul

Lecture 1 How to profile CUDA kernels in PyTorch

Lecture 1 | How to draw a cam profile (Knife edge follower)

Lecture 'Path Profiling (Part 1, Introduction)' of 'Program Analysis'

Linear Programming, Lecture 1. Introduction, simple models, graphic solution

GD&T Lesson 6: Profile Tolerances

Lecture 11| Introduction to Left Profile Plane and Right Profile Plane

Celebrate High Cholesterol + LDL | Lecture 161

Build Your Instagram Profile The Right Way - Module 1 - Lesson 2 - Instagram Unlocked

Lecture 7: Load profile part 1

The BIGGEST mistake tutors make in the first lesson

Harmonium basic Lesson Part 1

StatsLearning Lecture 1 - part1

Histology Lecture 1, Chapter 1

How To Create Professional LinkedIn Account & Profile | LinkedIn Complete Course| Lecture #2

Lecture 1-Introduction to Marketing Research

Lecture 1 | Convex Optimization | Introduction by Dr. Ahmad Bazzi

CS50x 2023 - Lecture 1 - C

Principles of Surveying Lecture 4 (Introduction to Leveling and Height of Instrument method)

Creating the Perfect LinkedIn Profile - Module 1 - Lesson 2 - LinkedIn Unlocked

CS50P - Lecture 1 - Conditionals

Lecture 1 Formation of the Universe and Solar System

Duolingo When You Miss Your Lesson...

Freud, Robert Paul Wolff Lecture 1

Lecture 1 - Introduction to Genetics

CS50 Lecture by Mark Zuckerberg - 7 December 2005