Parallel C++: SIMD Intrinsics

preview_player
Показать описание
In this video we at the basics manually vectorizing with SIMD intrinsics!

Рекомендации по теме
Комментарии
Автор

Hi Nick, I do not have a question but I would like to highlight again that your channel is remarkable. As far as I know, only by following your channel one can capture in a consistent way the latest achievements in SW, especially in excellent C++. It is a huge distinction to be here. Additionally, I appreciate your effort used for the preparation video each day. I have been using C++ for over 2 decades mostly within the robotics domain. Your impressive work gives all of us (the community) a new look at this beauty and encourages us to study more. Thank you so much. Have a nice day!

markusbuchholz
Автор

Hey Nick, amazing videos as always! Compiling with -ffast-math seems to unlock intrinsics for the transform_reduce baseline as well. Btw your videos are very inspirational, keep it up!

vwexwfe
Автор

The performance benefit of using SIMD intrinsics is really impressive! I wonder how often the use of SIMD instructions could speed up every day computing tasks.
My really blind guess would be that they are very underused even in computing intensive software.
Thanks Nick for this fantastic series so far!

shaytal
Автор

Hi nick, I just read about the alignment, and I would like to know why is it an improvement to align at 32 and not 64.. because 64 alignment (on 64bit system) would mean worst case of 4 cache misses and read of 64 bytes, while alignment of 32 would mean worst case of 6 cache misses.

Unless we are talking in 32bit system.

again, I might be wrong with how I perceived the cache, but I figured I will just ask while I still read about it.
Thanks alot

eladon
Автор

Why was it that the compiler did not recognize that it could use the vdpsp instruction? you did mention something about the compiler implementation, but dot product seems like something it should be able to figure out...

juancolmenares
Автор

Is it possible for you to also cover Arm Neon intrinsics if its possible?

This is a good topic and a good video :)

ahmedazeem
Автор

It’s unfortunate that SIMD doesn’t fit so may practical scenarios.

anm