How Branch Prediction Works in CPUs - Computerphile

preview_player
Показать описание
How does branch prediction speed up operations? Matt Godbolt continues the deep dive into the inner workings of the CPU

This video was filmed and edited by Sean Riley.

Рекомендации по теме
Комментарии
Автор

I like the little anecdote at the end about the ray tracer and changing a test to gain a big speed boost.

llamallama
Автор

In awe of the people who came up with such simple ideas to do branch prediction. And the people that work out how to build that logic in silicon, so that it can run in one click tick are gods!

axelBr
Автор

Kudos to the host for tending to ask very good questions about the topic being discussed.

kevincozens
Автор

Being able to explain a complex technical subject in a way I can understand is an amazing skill.

rudiklein
Автор

During my CS degree we had some classes about CPU architecture and pipelines, I was always impressed at how complicated the things we take for granted are actually are and what we studied was very very basic things, not even close to the magic which is Branch Prediction

elirane
Автор

I've been programming for a very long time but I didn't realise how sophisticated these branch predictors could get. The idea that it can compute a simple hash in a single clock cycle and use that to capture patterns is fascinating. Now that makes me want to go look into the details of some of these open CPU designs :)

aaronr.
Автор

This was an amazing explanation of branch prediction. I've been in tech for more of my life than not and I've known that branch prediction was a thing, but could not fathom how it worked even after some reading online and this made it approachable. Thank you :)

JamesHarr
Автор

Branch prediction is why there is a lot of algorithms that work faster on sorted data even if the order of elements theoreticaly doesn't matter for this algorithm.

eloniusz
Автор

Side note: Branch prediction is incompatible with keeping memory secret. Disable branch prediction when handling secrets.

baltakatei
Автор

love this explanation - plain and simple!

rm-
Автор

IBM managed to slow down their mainframe using branch prediction. How often do you have JMP (else branch) ? DSPs just had zero overhead loop instructions similar to the one in 80186 . So at the start of the loop you have an instruction where the immediate value says from where to jump back to here. Only needs a single register, not a table. Works on inner loops.

And then there is hyper threading, where you fill the pipeline with the lower priority thread instead.

No need for speculation or attack vectors.

ARM TDMI in GBA had a flag to instruct it to follow up branches. But how does it know that there is a branch before decoding? So it still needs a cache: 1 bit for each memory address to remember an up branch. At least this is well documented, and the compiler can optimize for it.

Even with this nice prediction: why not follow both paths with a ratio. One cycle to advance this path, 3 for the other. Stop at Load / Store to avoid hacks or inconsistencies.

PS3 showed the way: more cores, no CISC like optimization per core. Similar today with GPUs and their RISCV cores.

ArneChristianRosenfeldt
Автор

Very cool - great, understandable explanation!

nefex
Автор

Anybody else amazed by the fact Matt wrote the Fibonacci sequence in x86 and just knew the size of instructions

henriquealrs
Автор

Is that Ray Tracing video at the end soon to be released? Can't find it via search by name

vadrif-draco
Автор

As a software developer I'm wondering how you optimize for branch prediction when the cpu is effectively a black box. I guess you can only speculate that you are getting branches wrong or maybe there is a cpu setting to store branch prediction hits and misses?

bjryan
Автор

What I don’t quite understand, and this is perhaps because the metaphor breaks down, is what is the decoding robot actually doing? It takes a piece of information, and ‘decodes’ it into a different piece of information? But why is this information understood by the next robot where the original information wasn’t?

I presume this has something to do with determining which physical circuitry actually executes the instruction, but I can’t really visualise how that happens.

scaredyfish
Автор

I'm _predicting_ that that the one character change was from a short-circuit && to a bitwise &. The former might be compiled as two branch institutions, while the latter as only one.

MateoPlavec
Автор

i seem to have missed the original but this guy seems great at explaining CPU stuff

any chance of a further video about how Spectre class of vulnerabilities fits into this? (my limited understanding is there are a few more things going on in between but that seems the extreme example of branch prediction going wrong)

custard
Автор

My take away:
Branch Prediction: When I see this, I will give you that, noted.

Zenas
Автор

What happens when the predictor makes the fetcher fetch both branches, if it it sees a branch in an address that is not in the table, does that speed up the processor??

anata.one.