CppCon 2018: Jefferson Amstutz “Compute More in Less Time Using C++ Simd Wrapper Libraries”

preview_player
Показать описание


Leveraging SIMD (Single Instruction Multiple Data) instructions are an important part of fully utilizing modern processors. However, utilizing SIMD hardware features in C++ can be difficult as it requires an understanding of how the underlying instructions work. Furthermore, there are not yet standardized ways to express C++ in ways which can guarantee such instructions are used to increase performance effectively.

Lastly, this talk will also seek to unify the greater topic of data parallelism in C++ by connecting the SIMD parallelism concepts demonstrated to other expressions of parallelism, such as SPMD/SIMT parallelism used in GPU computing.

Jefferson Amstutz, Software Engineer
Intel
Jeff is a Visualization Software Engineer at Intel, where he leads the open source OSPRay project. He enjoys all things ray tracing, high performance computing, clearly implemented code, and the perfect combination of git, CMake, and modern C++.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

As for GPU kernels vs CPU kernels, the difference is in relative cost of memory operations compared to register calculation speed as well as size of register file. GPUs tend to have order of magnitude faster calculation while memory is on par or slower due to relatively smaller per thread caches - so you have to even more so spare memory bandwidth.
Also GPUs prefer bigger block operations than CPUs due to memory/cache architecture. That's about it.

AstralSorm
Автор

28:50 from what i understand trig functions are available only on avx512 which exists only on few xeons and, so far, very few consumer-grade CPU?

GeorgeTsiros
Автор

You dont have to modify your code at all with "vertical" vectorization... Just apply simd to all operations and enjoy free speed upgrade...
Meanwhile with horizontal, you have to rewrite your code competely for ray tracing, handle pointers of materials, reduction of closest hit and mainly recursion where paths and steps of each vectorized ray are very different.

panjak
Автор

Very good information, thank you.

The examples could be a bit more realistic. Neural networks use fundamental linear transformation Ax+b (A is matrix, x and b are vectors), 3d graphics use vectors of {x, y, z, w} (w is needed for transforms and perspective projection) along with 4x4 transform matrix multiplications.

maxxba
Автор

20:48 sorry if its noob talk but wouldnt the .insert()/.extract() methods defeat the purpose by adding a function call just to get the element ? not trying to hate but i dont get it

Theandrey
Автор

Why don't we have an "AsmCon" ? That could teach a few lessons to all the modern C++ hippsters.

llothar
Автор

“I can code faster in assembly” is the equivalent flex of “I can shift faster than your automatic”

tc
Автор

in examples there are no any handling of tails

ilnurKh
Автор

10:46 AMD GPUs do exactly 64 floats side by side though, right? Well, it's a bit iffy if you'd call that a SIMD register anyway.

msqrt
Автор

What would be the benefit of using the library over just letting the compiler autovectorize code? Modern compilers are already doing a pretty good job at that.

decayl
Автор

36:46 "Saxpy is nonsense as well" - pardon me, but SAXPY is at the core of most artificial neural networks: input*weight + bias. Just sayin'.

totalermist