Debugging and Performance Profiling for Frontier

preview_player
Показать описание
This half-day tutorial will walk through practical examples of debugging and performance profiling MPI programs on Crusher, an OLCF computer available now with nodes identical to Frontier. The tutorial will cover the range of tools provided by both the HPE Cray Programming Environment and the AMD ROCm Platform. In particular, topics will include the following.

* Debugging

- Interpreting error messages.

- Finding errors using Abnormal Termination Processing and the Stack Trace Analysis Tool.

- Using runtime debug logs.

- Using an actual debugger on parallel GPU programs: gdb4hpc with rocgdb.

* Performance Profiling

- Building and running experiments with the Cray Performance Tool.

- Interpreting results with pat_report and Apprentice2.

- Profiling and tracing MPI programs with rocprof.

This tutorial is for ECP developers that will target Frontier. Attendees should already be familiar with parallelism on GPU-accelerated distributed-memory computers. The tutorial assumes a working knowledge of Linux and MPI.
Рекомендации по теме