SIMD and vectorization using AVX intrinsic functions (Tutorial)

Показать описание

The best parallel programming technique you're probably not using. Using intrinsic functions to force SIMD parallelism per CPU core and gain speedups of between x4 and x16 on top of any other gains from threading etc.

Gives examples of how to use the intrinsic functions to accelerate your numerical coding.

Introductory Material (skip if you know what SIMD and intrinsics are)
00:00 Introduction
03:37 Intro to SIMD
05:17 SIMD instruction sets on x86
10:58 What are compiler intrinsics?
12:58 Simple comparison of standard C vs. AVX intrinsic summation
Basic setup of AVX for use in C/C++
15:11 Header files
16:25 Vector datatypes
18:19 Allocating memory
21:02 Intrinsic function naming 'convention'
23:55 Summary of AVX intrinsic functionality
Examples of AVX intrinsics
27:28 Intro
27:45 Arithmetic (e.g. addition, subtraction, multiplication, division) [_mm256_add_ps, _mm256_mul_ps, _mm256_div_ps]
30:53 Fused-multiply add [_mm256_fmadd_ps]
33:52 Math functions (e.g. max,min,sqrt) [_mm256_max_ps, _mm256_sqrt_ps, _mm256_rsqrt_ps]
34:33 Logical (e.g. and, or, xor) [_mm256_and_ps]
35:06 Load/store [_mm256_load_ps, _mm256_loadu_ps]
36:18 Comparisons (e.g. greater than, equals, less than) [_mm256_cmp_ps]
39:05 Branchless programming (approximating an 'if' statement in SIMD)
41:57 Permute/shuffle (rearranging elements within a vector) [_mm256_permutevar8x32_ps, _mm256_permute4x64_pd, _mm256_permute_ps]
46:20 What's a 'lane'?
49:10 Insert/extract [_mm256_insertf128_ps, _mm256_extractf128_ps]
49:51 Blend [_mm256_blend_ps]
50:30 Gather/scatter [_mm256_i32gather_ps]
52:22 Horizontal add [_mm256_hadd_ps]
53:12 Conversion (e.g. float32 to int32) [_mm256_cvtepi32_ps, _mm256_cvtps_epi32, _mm256_cvtps_pd, _mm256_cvtepi32_epi64]
53:34 Set (pseudo-intrinsic) [_mm256_set_ps, _mm256_set1_ps]
Programming example
54:45 Complex dot product
63:14 Vector reduction

Рекомендации по теме

Комментарии

This is the best SIMD intro I've seen. And the bonus (relevant) humor really helped give the topics some breathing room to let them sink in. Just an overall outstanding presentation.

matias-eduardo

This video and your channel generally needs much much more attention! I'm just starting out on SIMD and man, the serious yet funny and very clear explanation of how everything works is amazing!
I rarely subscribe to channels just by watching one video... I think it has happened less than the number of fingers I have on one hand but your channel was one of them!
Amazing job! Keep it up!

PBlague

At the moment, this is the best SIMD video I've seen! Thank you very much!

my_stackoverflow

Literally the best lecture I've ever seen in my entire life. Very good job, and a very big thank you for this!

AugasDopesugerwin

This was such an awesome video! I assume this is for some kind of university course (which I'm not part of) but I never used AVX instructions before and yet I could easily follow the video. I enjoyed all the jokes and the examples really helped. It really didn't feel like an entire hour video! I might give AVX a try in the future :)

anonymouscommentator

Educational, funny, and engaging for such a topic. I don't know what more one could ask for.

amj

Was stuck on the alignment issue since yesterday. Finally understood what the issue was and solved it. Thank you so much my friend.

Quancept

This was entertaining and insightful at the same time! Cleared so many confusions I had since I mainly work with Python and deep learning. The production quality (audio+video) too is outstanding! Looking forward to more videos :)

deeps-ny

Thanks for your wonderful video. I'm a newly CPU engineer and responsible for the uach of FP aprt. This vedio truly provides me a new aspect to understand vectorization and SIMD. Thank u!!!

ngissac