std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

preview_player
Показать описание
---

std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

C++26 is on route to ship `std::simd`, a facility for expressing data-parallelism via the type system, based on experience from `std::experimental::simd` (Parallelism TS v2). Data-parallel types have the potential to replace many uses of built-in arithmetic types with their `simd` counterpart in compute-intensive workloads, promising factors of speed-ups without algorithmic changes.

This talk presents how data-parallel types are designed to be more than just a thin wrapper around SIMD registers and instructions. They are designed to facilitate generic code, work/integrate with standard algorithms, etc, all while translating into efficient use of parallel execution capabilities. More important, data-parallel types are not "just another way to express data-parallel execution", they also provide new ways to design data structures for efficient memory access (high-throughput without sacrificing locality) using data-structure vectorization. The talk features examples of efficient use of `std::simd`.
---

Matthias Kretz

Matthias Kretz began programming with C++ as a high-school student, when he joined the development of the KDE 2 Desktop in its Alpha stages. He worked on GUI applications, the KDE core libraries, and developed the multimedia subsystem of the KDE Plasma Desktop (4.0), which became the first external contribution to Qt. At the same time he finished his studies in Physics in Heidelberg, Germany. For his thesis he worked on porting parts of the online-reconstruction software for the ALICE experiment at CERN to the Intel Larrabee GPU. This motivated his work on SIMD and Vc, an abstraction for expressing data-parallelism via the type system. Vc was the first solution of this kind released as free software. His PhD in computer science at the University of Frankfurt was a continuation of his SIMD work, developing higher level abstractions and vectorization of challenging problems.

Matthias has been contributing his SIMD work and his expertise in HPC and scientific computing in the C++ committee since 2013. Since 2022 he is chair of SG6 Numerics of the C++ committee. He is also a contributor to GCC and has founded and chaired C++ User Groups at Frankfurt Institute for Advanced Studies and at GSI Helmholtz Center for Heavy Ion Research.
__

---

#cppcon #cppprogramming #cpp
Рекомендации по теме
Комментарии
Автор

28:17 [slides 31-32] I think “old school c developers” would define Pixel as a union of a single uint32_t and a struct with 4 uint8_t, and try to use this union as a way of simplifying the read-/writing code. Such approaches are undefined in c++ (break strict aliasing rules, I believe).
I’m not sure if that C-style state of mind could guide us when designing how c++ should do it. Perhaps we should allow some std::simd<T> for T’s that are aggregates of same-type “vectorizable” member-variables? Perhaps this is a generalization that can implicitly allow simd<complex>, mentioned in 48:22.
Great talk, thanks Matthias !

Roibarkan
Автор

I absolutely love this, for my current project/library this is absolutely a game changer for portability.

scion
Автор

It's a cool concept, but in practice that will mean even more spoon-feeding the compiler to get the code you want.

dat_
Автор

Great talk! It seems that exploiting ILP when using simd can be very beneficial. Will library/compiler vendors be allowed to “do it for us” - e.g. is the default size() of std::simd<T> strictly mandated by the hardware, or will specific compiler/library vendors be allowed to choose larger size() (perhaps based on compiler flags) to exploit ILP? perhaps the ABI tag which was mentioned is able to support such desires.

Roibarkan
Автор

@4:00 I'm curious why fake_modify/fake_read instead of passing initial x value as a parameter and returning the result.

PaulJurczak
Автор

Intel's left hand: push SIMD into all languages it could, including many mask defined operations.

Intel's right hand: don't give us, simple people, AVX-512 for 10 years.

blacklion
Автор

if you actually care about the performance of your data-parallel code, your PC has a special massively powerful hardware component that's specifically designed to maximize the throughput of this exact kind of task. it's called a GPU.

Alexander_Sannikov