CppCon 2019: Gordon Brown “Efficient GPU Programming with Modern C++”

preview_player
Показать описание



Computer system architecture trends are constantly evolving to provide higher performance and computing power, to support the increasing demand for high-performance computing domains including AI, machine learning, image processing and automotive driving aids. The most recent being the move towards heterogeneity, where a system has one or more co-processors, often a GPU, working with it in parallel. These kinds of systems are everywhere, from desktop machines and high-performance computing supercomputers to mobile and embedded devices.

Many-core GPU has shaped by the fast-growing video game industry that expects a tremendous massive number of floating-point calculations per video frame. The motive was to look for ways to maximize the chip area and power budget dedicated to floating-point calculations. The solution is to optimize for execution throughput of a massive number of threads. The design saves chip area and power by allowing pipelined memory channels and arithmetic operations to have long latency. The reduce area and power on memory and arithmetic allows designers to have more cores on a chip to increase the execution throughput.

In CPPCON 2018, we presented "A Modern C++ Programming Model for CPUs using Khronos SYCL", which provided an introduction to GPU programming using SYCL.

This talk will take this further. It will present the GPU architecture and the GPU programming model; covering the execution and memory model. It will describe parallel programming patterns and common parallel algorithms and how they map to the GPU programming model. Finally, through this lens, it will look at how to construct the control-flow of your programs and how to structure and move your data to achieve efficient utilisation of GPU architectures.

This talk will use SYCL as a programming model for demonstrating the concepts being presented, however, the concepts can be applied to any other heterogeneous programming model such as OpenCL or CUDA. SYCL allows users to write standard C++ code which is then executed on a range of heterogeneous architectures including CPUs, GPUs, DSPs, FPGAs and other accelerators. On top of this SYCL also provides a high-level abstraction which allows users to describe their computations as a task graph with data dependencies, while the SYCL runtime performs data dependency analysis and scheduling. SYCL also supports a host device which will execute on the host CPU with the same execution and memory model guarantees as OpenCL for debugging purposes, and a fallback mechanism which allows an application to recover from failure.

Gordon Brown
Codeplay Software
Principal Software Engineer, SYCL & C++
Edinburgh, United Kingdom

Gordon Brown is a principal software engineer at Codeplay Software specializing in heterogeneous programming models for C++. He has been involved in the standardization of the Khronos standard SYCL and the development of Codeplay's implementation of the standard from its inception. More recently he has been involved in the efforts within SG1/SG14 to standardize execution and to bring heterogeneous computing to C++.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

please pass on my appreciation for the speaker repeating the question each time, so it is certainly preserved for posterity.

morthim
Автор

Excellent talk. How have I never heard of SYCL before?

jorispeeters
Автор

His accent is very cute. Also what an interesting talk!

emiliadaria
Автор

Why it does not work on windows with AMD display cards?

micham
Автор

Being someone who worked extensively on scientific computation using CUDA the point of an API following standard C++ over CUDA's kind of weird macros & launch syntax is that any standard C++ compiler will be able to understand it and compile it to some form. But if SyCL requires its own compiler to emit the device IR then doesn't that make the initial point invalid?
I understand that to do otherwise would require every c++ compiler to add support for SyCL specifically which is absurd. But then what exactly is the advantage of using SyCL's toolchain over CUDA's toolchain (which only requires manual execution of nvcc only) other than higher number of support device types?

indrajitbanerjee
Автор

Good talk. I'm understanding a little bit more about running code on the GPU now

OperationDarkside