C++ cache locality and branch predictability

preview_player
Показать описание
Cache me outside, how bout that?

People always talk about Big O time for analyzing speed, but Big O isn't the only important factor in writing performant code. Two important things to keep in mind are cache locality (locality of reference) and branch predictability. In this video, we go over these ideas and through examples we see the huge impact that they can have on performance.

SUPPORT ME ⭐
---------------------------------------------------

Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, John Martin, Casey G, Pieter G, Krisztian M, Sigmanificient

BE ACTIVE IN MY COMMUNITY 😄
---------------------------------------------------

CHAPTERS
---------------------------------------------------
0:00 Sorting
1:26 Cache locality
2:41 Vector traversal
4:39 Matrix multiplication
7:04 Branch predictability
9:56 Branchless optimization
Рекомендации по теме
Комментарии
Автор

Why does the ternary expression at the end not count as branching code?

spaghettihair
Автор

It's a proper coding channel. Every other coding channel's I had seen are like 'pop culture' coding. But this is different. Great work man.

ronyWeeb
Автор

I watched the video 10 times. Now, my code is going 1024 times faster. Thanks a lot man!

LennyTheSniper
Автор

I love the simplicity of your sorting algorithm implementations.

szaszm_
Автор

Always nice to get something a little closer to the metal than python from time to time

QuantumHistorian
Автор

Great point. When optimizing, *always* measure, people! Merely picking a particular algorithm, because it's supposed to be fast, is nothing more than superstition. You *can't* know without measuring.

Spiderboydk
Автор

C++20 has added the attribute keywords Likely and Unlikely so the compiler can do some of its own predictive branching optimizations. It make me wonder if CPUs will start to include an instruction to make even more use of this in a few years.

mytech
Автор

Since you are doing Python and C++ topics, would be nice to see some video about developing C/C++ extension modules for Python. Is there any easy-to-use framework which allows just-in-time compilation of C++ code snippets inside of Python?

philipp
Автор

9:20 In some AMD (I don't know the latest intel yet) CPU, the CPU actually computes both multiplications at the same time and then waits for the condition to finish to choose which one to use. I'm unsure if it was intoduced in Zen or only Zen2

brunoais
Автор

I loved to see std::sort laughing at all other implementations.
It's a good call to use the standard library when you can, as it's often more optimised then what you'd write yourself (with some exceptions).

Windeycastle
Автор

My first thought seeing this code: This is some weird ass python library, it almost looks like C++.
Several seconds later:

Broniath
Автор

it would be nice to have shown the Compiler Explorer output for the branch predicting example, as a ternary is technically a branch that might get optimized into a conditional move

wChris_
Автор

Man, I wish you did more of these low level optimization videos. I don't work with Python but I subscribed just because of these types of videos

fabriciop
Автор

I love how I only know two languages and these happen to be the only two you cover, haha.
Great video as always!

lefttraces
Автор

this guys offer so good content for free. this guy is awesome

prashantrana
Автор

What is missing is that YOU (the coder) yourself should prefer branchless code. Your example could have been written as an arithmetic formula. The compilers are smart, but we can help it. It's even better at optimizing algebra than optimizing branches.

ciCCapROSTi
Автор

1:11 > _"sorting small things is actually very common"_

i didnt know. thanks

yash
Автор

Holy smokes I already knew it, first time on this channel :DD

aratakarkosh
Автор

these will age like wine. thanks for the vid

efenestration
Автор

Awesome, thank you for an educational video straight to the point.

mauricio-poppe
welcome to shbcf.ru