CUDA Developer Tools | Intro to NVIDIA Nsight Compute

preview_player
Показать описание
Join NVIDIA’s Jackson Marusarz for an introduction to NVIDIA Nsight Compute, a tool for in-depth analysis of CUDA kernel performance on GPUs.

00:00 - Introduction
00:45 - NVIDIA Nsight Compute Replay Modes
1:13 - Setup Tips
2:05 - How to Profile Remotely
2:35 - NVIDIA Nsight Compute Activities
3:56 - Configure and Run the Profile
5:39 - Viewing Reports
6:18 - Source Code Page
6:48 - Conclusion

Highlights include:

Setting up Nsight Compute: Get insights into the capabilities of Nsight Compute, including setup tips and key features for performance analysis. Learn how to harness Nsight Compute to understand the performance of your GPU.

Collecting metrics: Discover how Nsight Compute collects performance and throughput metrics, including from hardware counters and code instrumentation.

Configuration: Learn about permissions for accessing GPU counters and how to enable source-level details without compromising performance. Get details about configuration options for non-interactive profiles.

Nsight Compute reports: Nsight Compute generates detailed reports with runtime information, speedup estimates, and more. You can even examine source-level profiling data.

This video series will help get you started with NVIDIA Nsight Developer Tools for CUDA. Grow your proficiency with the tools and apply the examples to your own development environment. Or return to specific episodes for a refresher on certain features and functionalities. We walk through analyzing performance reports, offer debugging tips and tricks, and show you the best ways to optimize your CUDA code. The series will focus primarily on Nsight Compute and Nsight Systems.

Thanks for watching, and stay tuned for more episodes.

#CUDA #Nsight #developertools #NVIDIA #HPC #LLM #CUDAtutorials
Рекомендации по теме
Комментарии
Автор

Thanks for sharing. Can it profile a headless Vulkan compute kernel?

BramStolk
Автор

While profiling 2 kernels which perform convolution over 1000 iterstions using nsight compute-> the tool specifies that the swiftly executed kernel is 10us slower. Is it possible that thr tool can introduce a overhead of 2-3us while profiling

madhusudana.a.vmadhusudan.
Автор

I am running my kernel in a Docker container on a server. How can I connect interactive profiler to the docker environment?

kjkszpjab
Автор

Why are these tutorials so fast? no example or deep explanation; it's just like reading a manual. You could've done a better job here.

omid