Introduction to GPU Programming in Chapel

preview_player
Показать описание
Chapel’s parallel-first design allows users to seamlessly use GPUs in a vendor-neutral way. In this demo, we:

* quickly recap how Chapel arrays are created/processsed
* demonstrate how these operations can be done on the GPU via ‘on’ statements and remote variable declarations
* introduce the ‘locale’ concept and ‘config’ variables for GPU/CPU portability
* discuss GPU-based reductions and ‘Math’ module operations in GPU kernels

Рекомендации по теме
Комментарии
Автор

Is it possible to do all this cuda kernel optimization tricks like fetching data to shared memory, tiling, vectorized access, etc.. I mean - matrix multiplication chapel code performance would be close to naive cuda kernel implementation or to cuBLAS?

denisstepanenko