x86 Internals for Fun & Profit • Matt Godbolt • GOTO 2014

preview_player
Показать описание
This presentation was recorded at GOTO Chicago 2014. #gotocon #gotochgo

Matt Godbolt - Low-latency C++ Developer @MattGodbolt

RESOURCES

ABSTRACT
It's easy to treat the CPU that executes our code as a black box, but understanding what really goes on inside it can help you write more efficient code.

In this talk, Matt will lift the lid on modern x86 processors. He'll explain some of their features and how the code you write maps to those features. He'll give examples of how to diagnose and fix performance issues. Topics covered include memory, caching, out-of-order execution and branch prediction. [...]

Download slides and read the full abstract here:

#x86 #MattGodbolt #x86Processors #Processor #Programming #SoftwareEngineering

CHANNEL MEMBERSHIP BONUS
Join this channel to get early access to videos & other perks:

Looking for a unique learning experience?

SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.
Рекомендации по теме
Комментарии
Автор

Great video about a guy giving a presentation about the inner workings of an x86 execution pipeline. Wonderful gestures, I've seen few people point at slides like this presenter. And the way he sometimes moves his head up and down... Marvelous! The topic of the presentation seemed interesting too, but it is difficult to tell without actually seeing the slides :(

carvoloco
Автор

a very bad video composition, the speaker discuss something on a screen, instead the screen with a code we are continue watching the speaker for a prolonged time then a short glimpse of the screen

test
Автор

The more I see about things involving x86 and RISC stuff the more interesting it gets, x86 is really hard to decode but the equivalent RISC encoding would be 4-6 times longer, and then x86 can turn around and convert the tiny CISC into VLIW level internal representations. tldr it's just really cool

mekafinchi
Автор

Great talk indeed. If I may suggest, please add the slides as a picture in picture view rather than switch to it. A lot of the time the speaker is making a reference that is lost because there's no way to see the slides.

Carutsu
Автор

It's simple, really: first read the 5000 page manual then read the "RELATED LITERATURE" and then you're halfway there. Then just read the amd manuals and realize it actually makes sense now but you're also 10 years older.

LemonChieff
Автор

Excellent talk! Conveying CPU internals in such a clear way is a talent.

evilone
Автор

One little correction at the end: while it is true that electrons don't move anywhere near the speed of light (they "drift" centimeters per second or less), the electric signal does move at the speed of light. Specifically the speed of light in the wire (of course for a very lose definition of light: wires are "transparent" to signals, which are electro-magnetic, like light) - this is about 75-90% of c (speed of light in vacuum), depending on the specific material.

One note for the incurably curious: when he sais "very few electrons" in a RAM cell - for modern RAM or Flash cells that is in the region of a few hundred to one thousand electrons - it used to be a few tens of thousands in the 90'ies. That is femto Farads in storage capacity. In comparison for a LED to briefly light up millions to billions of electrons have to move.

KonradTheWizzard
Автор

The slides aren't visible half of the time :/

eXcalibooor
Автор

That was really a great talk, highly recommended. Thanks, Matt.

angelsbmartin
Автор

Skip to 10:56 if you already know Intel assembly.

johnmccrary
Автор

The person responsible for this "montage" should change profession and stay away from any video production related jobs. Where are the slides? Did the guy even watch it? *Sigh*

PingPong-empg
Автор

wrt instruction decoding; you start with in parallel deciding that if location X is an instruction, how long it would be, for N bytes at a time. The cycle after you do this for N bytes further (and so on). In the second cycle, you sequentially add up the lengths to get to the limit of valid instructions and ship those instructions off to the actual decoder, with still implied links between them wrt dependencies. The step after then outputs a bitfield for the actual register changes it does so that if there's an actual dependency on a previous instruction it can be delayed.

I'm more impressed that they made this kind of stuff work with self-modifying code - which is like throwing a giant wrench into this entire machine & making it do one instruction per full pipeline flush.

dascandy
Автор

Great discussion of the tradeoffs of RAM vs cache @ ~48 mins

JohnSmith-hexg
Автор

Nice talk (though I've not seen all of it yet), branch prediction first came up for me in this relatively well known Stackoverflow Question: "Why is processing a sorted array faster than an unsorted array?"

andrewmartin
Автор

Great talk. Needs to be watched several times to grasp the vast information in there. I made it all the first time:)

frutiboy
Автор

how the hell did that ISA end up being our standard.

walterbz
Автор

you mentioned the fetcher reads 16 bytes... that would make sense to me even though the max instruction size is 15, but if you put a valid 15-byte instruction on the last 15 bytes of an executable page (the next page is PAGE_NOACCESS or non-existing), it executes nominally, ie. without causing a page fault, whereas if your process tried to read the 16 bytes it would trigger a page-fault from the 16th byte being on the invalid page. So maybe it just reads 1 byte at a time and determines if it needs to read more? Or maybe it still tries to read 16 bytes but simply doesn't trigger faults during fetch

DaveWhoa
Автор

Huh, at 31:04 it sounded like you said "most instruction sets have a decrement and jump if not zero, but x86 doesn't". I'm confused because that's exactly what loop does, it decrements CX and jumps if not zero.

eformance
Автор

i'm way out of my element here, but couldn't these chips expose some API / control mechanism for informing it about branching patterns in the compiled code that back end compiler people could target.

this only comes to mind because the task of looking for patterns in branch behavior seems like something naturally in the domain of software and not hardware.

AlexanderBollbach
Автор

I will be watching this video one minute at a time, before having a lie down, in order to avoid nosebleeds. I'm two minutes in and it seems good so far - although the bloke doing the talk seems a bit shifty

jubbernaut
join shbcf.ru