Adding Nested Loops Makes this Algorithm 120x FASTER?

preview_player
Показать описание
In the last video, I introduced the concepts of compute-bounded and memory-bounded tasks. This video takes a step further and uses the theory we discussed to optimize a famous memory-bounded algorithm.

Many of these tricks are counterintuitive but highly effective. By the end of the video, you'll find we can make the algorithm around 120x faster than the naive implementation.

Рекомендации по теме
Комментарии
Автор

dude, this video is so beautifully animated. i can't wrap my head around how much time you spent on this.

bernardcrnkovic
Автор

The fact that optimizing for cache hits and enabling SIMD was able to bring a given matrix multiplication operation from 2000ms to 15ms is wild.

Zullfix
Автор

This video flew entirely above my head (still getting into this math in Uni haha), but the presentation is stunning, and I still watched through all of it because these optimizations are weirdly beautiful to me.
Awesome job!

Affax
Автор

I saw your video in the ThePrimeTime, and your video is epic. Very well explain for such high level topics.

RoyerAdames
Автор

The final part where you say "Oh but there is that library that does everything for you for that case" just made me laugh my head off, great vid full of deep commentary, really great job!

Val-vmqu
Автор

You can get better performance by unrolling the loops(do it aggresively, for example, unroll the whole loop by block size(in GCC, #pragma GCC unroll 8)). However, it would still be 3 times slower than intel mkl.

foolriver
Автор

That's awesome! I've been studying parallel programming and a lot of these strategies i had no idea were possible. I wish my university had courses on HPC/Parallel programming like yours. This course you mentioned at the end of the video seems great.

shimadabr
Автор

great animations and visual style! you've found yourself a great side hobby :)

trustytrojan
Автор

I love how in depth this video makes me feel smart. I know that only a few people could make sense of such content like this. But, you make it feel like even more people can get close to it.

cefcephatus
Автор

I've been teaching myself how to use SIMD for the past 3 weeks and I can't even tell you how helpful this video was. I was baffled when my serial implementation of some image processing code was 10x faster than my naive SIMD implementation. Took me quite a while to understand how that was possible. This video has made my greatly appreciate the simplicity of Rust's (experimental) portable SIMD library. Also, did not know what OpenMP was and it seems somewhat similar to Rust's library. Absolutely incredible video!

percent
Автор

I love it when videos like these give me inspiration to delve into a topic I'm not familiar with. I admire people like you for coming up with such elegant and beautiful ways to communicate these concepts.

giordano
Автор

Subscribed. I can't believe you don't have more subscribers already. Any software engineer dealing with matrix math should watch this video.

fenril
Автор

I did this exact thing in my intern project this summer at TI. Their processor even had a special instruction to do the matrix multiplication of two small blocks. I was able to achieve around 80% the rated capacity of the processor.

Dayal-Kumar
Автор

Until now I only considered the efficiency of the algorithm for optimizing program execution. Although I admit I didn't understand everything in this video it demonstrates that knowledge of the computer's hardware and taking advantage of it can significantly speed up execution.

brucea
Автор

Great video! But I feel like I will never get to do this kind of work during my job, since we use a scripting language, and all bets are off when you store everything on the heap.

megaxlrful
Автор

I loved the presentation and the animations really helped me collect the concepts.

MahdeenSky
Автор

Very well done! I make a living optimizing BLAS routines, this will probably become my default “what do you do” to send people

camofelix
Автор

Very good. I haven't thought about optimization for a long time since I was doing Assembly in the 90's. This brings back memories and the feeling. Very good!

naj
Автор

Visualizations were freaking amazing! Loved those! could you make a video of your process for editing these videos?

tansanDOTeth
Автор

probably one of the best programming videos I have ever seen, as a more senior developer there is a lack of content on this level of production quality when explaining complex ideas

joshpauline