std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

Показать описание

---

std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

C++26 is on route to ship `std::simd`, a facility for expressing data-parallelism via the type system, based on experience from `std::experimental::simd` (Parallelism TS v2). Data-parallel types have the potential to replace many uses of built-in arithmetic types with their `simd` counterpart in compute-intensive workloads, promising factors of speed-ups without algorithmic changes.

This talk presents how data-parallel types are designed to be more than just a thin wrapper around SIMD registers and instructions. They are designed to facilitate generic code, work/integrate with standard algorithms, etc, all while translating into efficient use of parallel execution capabilities. More important, data-parallel types are not "just another way to express data-parallel execution", they also provide new ways to design data structures for efficient memory access (high-throughput without sacrificing locality) using data-structure vectorization. The talk features examples of efficient use of `std::simd`.
---

Matthias Kretz

Matthias Kretz began programming with C++ as a high-school student, when he joined the development of the KDE 2 Desktop in its Alpha stages. He worked on GUI applications, the KDE core libraries, and developed the multimedia subsystem of the KDE Plasma Desktop (4.0), which became the first external contribution to Qt. At the same time he finished his studies in Physics in Heidelberg, Germany. For his thesis he worked on porting parts of the online-reconstruction software for the ALICE experiment at CERN to the Intel Larrabee GPU. This motivated his work on SIMD and Vc, an abstraction for expressing data-parallelism via the type system. Vc was the first solution of this kind released as free software. His PhD in computer science at the University of Frankfurt was a continuation of his SIMD work, developing higher level abstractions and vectorization of challenging problems.

Matthias has been contributing his SIMD work and his expertise in HPC and scientific computing in the C++ committee since 2013. Since 2022 he is chair of SG6 Numerics of the C++ committee. He is also a contributor to GCC and has founded and chaired C++ User Groups at Frankfurt Institute for Advanced Studies and at GSI Helmholtz Center for Heavy Ion Research.
__

---

#cppcon #cppprogramming #cpp

Рекомендации по теме

Комментарии

28:17 [slides 31-32] I think “old school c developers” would define Pixel as a union of a single uint32_t and a struct with 4 uint8_t, and try to use this union as a way of simplifying the read-/writing code. Such approaches are undefined in c++ (break strict aliasing rules, I believe).
I’m not sure if that C-style state of mind could guide us when designing how c++ should do it. Perhaps we should allow some std::simd<T> for T’s that are aggregates of same-type “vectorizable” member-variables? Perhaps this is a generalization that can implicitly allow simd<complex>, mentioned in 48:22.
Great talk, thanks Matthias !

Roibarkan

I absolutely love this, for my current project/library this is absolutely a game changer for portability.

scion

It's a cool concept, but in practice that will mean even more spoon-feeding the compiler to get the code you want.

dat_

Great talk! It seems that exploiting ILP when using simd can be very beneficial. Will library/compiler vendors be allowed to “do it for us” - e.g. is the default size() of std::simd<T> strictly mandated by the hardware, or will specific compiler/library vendors be allowed to choose larger size() (perhaps based on compiler flags) to exploit ILP? perhaps the ABI tag which was mentioned is able to support such desires.

Roibarkan

@4:00 I'm curious why fake_modify/fake_read instead of passing initial x value as a parameter and returning the result.

PaulJurczak

Intel's left hand: push SIMD into all languages it could, including many mask defined operations.

Intel's right hand: don't give us, simple people, AVX-512 for 10 years.

blacklion

if you actually care about the performance of your data-parallel code, your PC has a special massively powerful hardware component that's specifically designed to maximize the throughput of this exact kind of task. it's called a GPU.

Alexander_Sannikov

std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

std::simd: How to Express Inherent Parallelism Efficiently Via Data-parallel Types - Matthias Kretz

Angus Hewlett - SIMD, vector classes and branchless algorithms for audio synthesis (ADC'17)

Performance: SIMD, Vectorization and Performance Tuning | James Reinders, former Intel Director

Vectors and Values with Vladimir Ivanov @iwan0www and Ian Graves

Enter The Parallel Universe of the Vector API by SIMON RITTER

Vectorization (SIMD) and Scaling (TBB and OpenMP) | James Reinders, Intel Corporation

Challenges and insights into Small Island Statistics

HC31-S1: General Purpose Compute

DynaSOAr: Parallel Memory Allocation for GPU Object-oriented Programming with Efficient Memory Use

Dyalog16: A Compendium of SIMD Boolean Array Algorithms in APL

Parallelism on Ranges: Should We? - Giannis Gonidelis - [CppNow 2021]

Fully Linear PCPs and their Cryptographic Applications

Let’s Talk About C++ Abstraction Layers - Inbal Levi - CppNow 2023

scale.bythebay.io: Evan Chan, Building a High-Performance Database with Scala, Akka, and Spark

ICCD 2019 Keynote: Processing Data Where It Makes Sense: Enabling In-Memory Computation (Onur Mutlu)

ZK7: A Lightning-fast Poseidon Implementation - Jakub Nabaglo - Polygon Zero

#ACACES20 / Exploiting the Benefits of GPUs: Programming, Architectures and Simulation (session #1)

Keynote: The Landscape of Modern Parallel Programming Using Open Standards (Michael Wong, Codeplay)

SYCL Summer Sessions Day 2 - SYCL Programming on Multi-GPU Systems

DOE CSGF 2013: Trends in Next Generation HPC Architecture

Node topology and programming models

ComputeAorta: A toolkit for implementing heterogeneous programming models

114 - Discover Strategies for De-Risking Medical Device Development: FDA Consensus Standards

Newton's Method: How to Compute Pretty much Anything