Branchless Programming: Why 'If' is Sloowww... and what we can do about it!

preview_player
Показать описание


In this video we look at branchless programming. This is a technique to gain speed in our high and low level programming by avoiding branching code as much as possible.

Software used to make this vid:

Рекомендации по теме
Комментарии
Автор

I love this, but I'm reminded of Kernighan's law: “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

jeremyseay
Автор

DISCLAIMER FOR ALL BEGINNER PROGRAMMERS!
Do not attempt it at home! Unless you really, really have to. In most cases you won't outsmart your compiler. For high level programming, you should focus on keeping your code clean and easy to read and maintain. Usually you'll notice speedup in your code by using correct algorithms and data structures, not by doing micro-optimizations. And if speed is your concern - benchmark first! Do not try optimizing code that may not need it.

szymoniak
Автор

I LOVE how you introduced the concept by showing a way to manually optimize and how it actually fails to be better than the straightforward if-based version. That's such an important point to share up front any time we talk about clever optimization.

Mackinstyle
Автор

When a normal person says something is slow: 5-10 minutes, maybe an hour or two
When a programmer says something is slow: 2 nanoseconds

RyanTosh
Автор

So what we've learnt: If you write normal code and normal patterns that everybody knows, most likely it will be a pattern that the developers of gcc (or other compilers) thought of. That way you have a higher chance of the compiler doing good things.

barmetler
Автор

I've designed CPUs for over 30 years and we pay a LOT of attention breaks in code streams. Branch prediction, speculative execution, and even register reassignments are all about minimizing branch and pipeline impacts. What you may be missing in the above examples is that what seems to be avoiding branches ... don't. Every conditional test is a branch. Every conditional assignment is a branch. It is hard to beat the compiler optimization that understands the pipeline of the processor it's targeting. Clever manual decimation can be effective for long pipelines, such in GPUs, but even then compilers are pretty smart. The most efficient code is when tasks are parallelized so that multiple operations can be done based off of one or a small number of decisions.

In trying to avoid branches, the cost is paid in non-maintainable code and is very high. If you really want to improve efficiency, don't rely on multiple software packages which actually may include interpreted languages somewhere underneath the shiny API. Layering of packages and the use of interpreted languages (Python, PERL, etc.) waste much of the increasing performance of processors. Yes, it means recompiling for a new processor, but one does that if efficiency is required.

In one example, we recoded a numeric-heavy program that was synthesizing frequencies for an electronic piano. Recasting it to vector operations allowed the program to run on a very inexpensive Beagle-Bone Black instead of a MAC. On the MAC it consumed 35% of the processor. On the BBB it used 15% of a much less powerful processor by vectorizing the operations. These are the sorts of changes that matter.

randyscorner
Автор

I think it should be noted that branchless programming is VERY important in cryptography. It turns out that if you ise conditional statements -> sometimes your code runs faster sometimes slower (specific branches just are faster/ slower), the attacker can get info on your private key. So all cryptographic sound functions must use branchless programming.

Very interesting topic

Caesim
Автор

05:00 The lesson here: never over-optimize your code beforehand. Start by writing natural and simple code. Let the compiler do its tricks. Check the result and only look for optimizations in the parts that the compiler didn't already optimize.

christianbarnay
Автор

Cryptographic libraries also use branchless programming a lot. This is to prevent timing side channel attacks, where the timing of an operation can provide a hint as to what was encrypted.

kylek.
Автор

I want to add that the advantage of branchless programming is many times bigger if you are programming for the GPU and not the CPU. GPUs have multiple cores that share a instruction pipeline, where each core runs the same instruction but with other numbers. That often leads to both sides of a branch being executed and then one is discarded. I have seen performance improvements above 100x by going branchless!

MultiMrAsd
Автор

Shaders. Branchless techniques are mandatory in those!

programaths
Автор

As a long time software developer, in virtually every case, code maintainability is far more important than speed.
I've written programs that processed millions of transactions - and branching was the least of the performance concerns. Paging and I/O typically are much more a problem. Except in toy problems, memory management will be a much larger concern.
Introducing repeatedly executed multiplication generally is not good for performance.
"And"-ing with ASCII lower case converts to uppercase as well as subtraction, and to my mind, is much clearer.
Unless there is a genuine need, don't make your code ugly to eke out a few cycles.

vanlepthien
Автор

Leading with a negative example was smart - I think a lot of beginners would have tried applying this to homework assignments that really didn't need it if you hadn't.

rorytulip
Автор

This is really good fundamental information for some of us "spoiled" high-level programmers who don't typically think beyond compilation

brandonchurch
Автор

A fast branchless way to calculate ToUpper is to use a lookup table. The table is 256 bytes long, and easily fits in L1 cache, so character conversion takes a single instruction and will be a fast memory access. I think this is what is done in the standard library.

josiahmanson
Автор

It's useful when you do shader programming where branching is very expensive. I had to «invent» some of those examples by myself when did shader programming. Great video!

unformedvoid
Автор

For someone who started coding in assembly before pipelining and speculative execution were a thing, and when multiplications were super expensive, the idea of multiplying by a boolean to make a selection always feels slightly wrong. And a part of me still wants to replace every multiplication with shifts + adds, or look-up tables. :-P

RFC-
Автор

I want to buy this guy a mouse pad and maybe a second monitor...

MM-
Автор

Awesome!
Branchless programming would be even faster if the CPU had an instruction that filled all the bits of a register with the zero flag.
These "computer science approach to C++ programming" videos are getting really good and I suspect are unique on youTube. You are improving (possibly, even saving) our mental health with these conceptual challenges and I, for one, am so grateful.

willofirony
Автор

I was doing this back in the late 1970's, on a Cray-1 supercomputer with pipelined vector registers. Complex single-instruction operations were broken down into a sequence of simpler operations, each performed by a separate functional subunit. After one clock cycle, the first functional subunit was free to begin operating on the second element on the vector, while the second functional subunit began operating on the intermediate result in the first element of the vector that was left there by the first functional subunit. Complex operations like floating point division could then produce a result every clock cycle, even though the operation could take a total of six or more clock cycles to perform on a single operand beginning-to-end. Like stations on a car assembly line, the game was to keep as many functional subunits busy as possible at any given time. We would carefully map out each tick of each component of the CPU on a big chart before implementing in assembler. The thing was blazing fast, even by today's standards. Nice vid!

bandogbone