CUDA Developer Tools | Performance Analysis with NVIDIA Nsight Systems Timeline

Показать описание

In this episode of the CUDA Developer Tools tutorial series, Eyal Soha, senior software engineer at NVIDIA, introduces code performance analysis using the Timeline View in NVIDIA Nsight Systems. Basic knowledge of C++ CUDA programming is recommended.

Highlights include:

◽ Explore the Nsight Systems timeline, a powerful tool for analyzing GPU performance. Learn how the Timeline View helps you understand code processes and uncover optimization opportunities.

◽ Get an overview of the timeline interface and how to navigate the metrics Nsight Systems collects. Read the timeline, customize your view, and understand CPU and GPU utilization.

◽ Learn how NVTX markers in the code add annotations to the timeline, highlighting essential activities such as memory transfers and kernel executions.

◽ Learn how to use the timeline view to make improvements to your code. Understand concepts like "latency hiding" to identify bottlenecks and make informed optimizations. Explore the benefits of parallelism and uncover how asynchronous operations can impact code performance.

00:00 - Introduction
0:58 - Nsight Systems Timeline
3:19 - Correlating CPU and GPU Activity
3:43 - NVTX Markers
4:17 - CUDA Memcpy
6:40 - Optimized Code and Latency Hiding
11:18 - More Optimized Code

This video series will help get you started with NVIDIA Nsight Developer Tools for CUDA. Grow your proficiency with the tools and apply the examples to your own development environment. Or return to specific episodes for a refresher on certain features and functionalities. We walk through analyzing performance reports, offer debugging tips and tricks, and show you the best ways to optimize your CUDA code. The series will focus primarily on Nsight Compute and Nsight Systems.

Thanks for watching, and stay tuned for more episodes.

#CUDA #Nsight #developertools #NVIDIA #HPC #LLM #CUDAtutorials

NVIDIA Developer

Рекомендации по теме

Комментарии

Thank you for this awesome tutorial series! I would personally enjoy an expert analyzing the performance of a pipeline that calls Cuda kernels from a within Python script.

Shurgath

what's the reason to buy nvidia's expensive chip, and write cuda-code to help nvidia sell more expensive chips?

yuan.pingchen

Enunciate . Clearly. Dont mumble.

You are presenting to an audience. This is a "formal" interaction. You are not snubbing your mother's attempts to feed you vegetables at a high chair as a toddler.

Speak slowly, clearly.

MrNewAmerican

CUDA Developer Tools | Performance Analysis with NVIDIA Nsight Systems Timeline

CUDA Developer Tools | Performance Analysis with NVIDIA Nsight Systems Timeline

CUDA Developer Tools | Intro to NVIDIA Nsight Compute

CUDA Developer Tools | NVIDIA Nsight Tools Ecosystem

CUDA Developer Tools | Intro to NVIDIA Nsight Systems

CUDA Developer Tools | SOL Analysis with NVIDIA Nsight Compute

Nvidia CUDA in 100 Seconds

CUDA Tutorials I Profiling and Debugging Applications

CUDA: New Features and Beyond | NVIDIA GTC 2024

Latest Updates to CUDA Developer Tools

What's New in CUDA Developer Tools: Profiling NVIDIA Hopper and workflow enhancements

CUDA Developer Tools | Memory Analysis with NVIDIA Nsight Compute

Mythbusters Demo GPU versus CPU

Demystify CUDA Debugging and Performance with Powerful Developer Tools NVIDIA On Demand

Installing CUDA Toolkit on Windows [Published 2017 - See our playlist for more up-to-date trainings]

NVIDIA Performance Tools for A100 GPU Systems Q&A

Intro to CUDA - An introduction, how-to, to NVIDIA's GPU parallel programming architecture

CUDA Programming Course – High-Performance Computing with GPUs

CUDA Crash Course: GPU Performance Optimizations Part 1

Writing Code That Runs FAST on a GPU

Optimizing CUDA Memory Allocations Using NVIDIA Nsight Systems

GTC 2022 - CUDA: New Features and Beyond - Stephen Jones, CUDA Architect, NVIDIA

Performance Tuning the NVIDIA Grace CPU with NVIDIA Nsight Tools

Nsight Visual Studio Code Edition

Quantum Visualization Tools in CUDA-Q