Introduction to Branchless Programming

preview_player
Показать описание
A brief introduction to branchless programming, a rarely seen technique which attempts to write code which eliminates conditional jump instructions. In this video, I attempt to explain the rationale behind this, why it "works" and why you probably don't need it.

Twitter: /dev/null
Рекомендации по теме
Комментарии
Автор

The benchmarks are entirely wrong. Being the complete moron that I am, I managed to re-run the benchmark without re-compiling (and ran the wrong program for one of them).


As you can see, minimal but noticeable difference under an intentionally inflated, artificial situation.

In addition, what I said about "shorter usually being better" etc. is not a good golden rule to follow. One machine instruction does not equal one clock cycle. In fact, in x86, there is a single instruction that calls a hardware true RNG. This can take upward of 10ms for a single call. For context, we usually measure instruction execution time in microseconds. In general, however, concise code is a good indicator of quality.

ej_v
Автор

One of the most important areas for branchless programming is in embedded development. I work with ATMega's and the less clock cycles I can run the better. I'm very anal about wasting clock cycles, and thus power.

codehorror
Автор

Slight clarification:

Pipelining and out-of-order execution are *not* the exact same thing. You could have a pipelined CPU that always executed in-order, which is what most early super-scalar processors did (e.g. Playstation 2 CPU). The reverse is also possible (e.g. CDC 6600 in the mid-1960s).

Out-of-order execution is a nice optimization, but pipelining still pays off without it. Both are just a means to instruction-level parallelism and avoiding latency, which are just two ways of saying "keep as much of the CPU always busy doing useful work."

frenchmarty
Автор

How does target % 10 == 0 equate to one? It should be true or false. Also did you use a web cam to record this video? I can barely make out any of the text, because it is so blurry.

bcr-cvn
Автор

I feel like the argument about monomorphization hurting readability is really c++ specific. In Rust dynamic and static dispatch are equally pretty.

martingeorgiev
Автор

What program do you use to disassemble the binary? Doesn’t look like gdb

jfk
Автор

I've heard if you really going for optimization, you should use ++i instead of i++ when order of increase doesn't matter, because in postfix form value of i should be saved before increasing. Don't know if it's true, but sounds reasonable IMHO.

kvolt
Автор

Then I guess there is also the separate topic of avoiding unnecessary branching to make your code easier to read. Code with lots of nested control flow is difficult to read.

BosonCollider