x86 Internals for Fun & Profit • Matt Godbolt • GOTO 2014

Показать описание

This presentation was recorded at GOTO Chicago 2014. #gotocon #gotochgo

Matt Godbolt - Low-latency C++ Developer @MattGodbolt

RESOURCES

ABSTRACT
It's easy to treat the CPU that executes our code as a black box, but understanding what really goes on inside it can help you write more efficient code.

In this talk, Matt will lift the lid on modern x86 processors. He'll explain some of their features and how the code you write maps to those features. He'll give examples of how to diagnose and fix performance issues. Topics covered include memory, caching, out-of-order execution and branch prediction. [...]

Download slides and read the full abstract here:

#x86 #MattGodbolt #x86Processors #Processor #Programming #SoftwareEngineering

CHANNEL MEMBERSHIP BONUS
Join this channel to get early access to videos & other perks:

Looking for a unique learning experience?

SUBSCRIBE TO OUR CHANNEL - new videos posted almost daily.

Рекомендации по теме

Комментарии

Great video about a guy giving a presentation about the inner workings of an x86 execution pipeline. Wonderful gestures, I've seen few people point at slides like this presenter. And the way he sometimes moves his head up and down... Marvelous! The topic of the presentation seemed interesting too, but it is difficult to tell without actually seeing the slides :(

carvoloco

a very bad video composition, the speaker discuss something on a screen, instead the screen with a code we are continue watching the speaker for a prolonged time then a short glimpse of the screen

test

The more I see about things involving x86 and RISC stuff the more interesting it gets, x86 is really hard to decode but the equivalent RISC encoding would be 4-6 times longer, and then x86 can turn around and convert the tiny CISC into VLIW level internal representations. tldr it's just really cool

mekafinchi

Great talk indeed. If I may suggest, please add the slides as a picture in picture view rather than switch to it. A lot of the time the speaker is making a reference that is lost because there's no way to see the slides.

Carutsu

It's simple, really: first read the 5000 page manual then read the "RELATED LITERATURE" and then you're halfway there. Then just read the amd manuals and realize it actually makes sense now but you're also 10 years older.

LemonChieff

Excellent talk! Conveying CPU internals in such a clear way is a talent.

evilone

One little correction at the end: while it is true that electrons don't move anywhere near the speed of light (they "drift" centimeters per second or less), the electric signal does move at the speed of light. Specifically the speed of light in the wire (of course for a very lose definition of light: wires are "transparent" to signals, which are electro-magnetic, like light) - this is about 75-90% of c (speed of light in vacuum), depending on the specific material.

One note for the incurably curious: when he sais "very few electrons" in a RAM cell - for modern RAM or Flash cells that is in the region of a few hundred to one thousand electrons - it used to be a few tens of thousands in the 90'ies. That is femto Farads in storage capacity. In comparison for a LED to briefly light up millions to billions of electrons have to move.

KonradTheWizzard

The slides aren't visible half of the time :/

eXcalibooor

That was really a great talk, highly recommended. Thanks, Matt.

angelsbmartin

Skip to 10:56 if you already know Intel assembly.

johnmccrary

The person responsible for this "montage" should change profession and stay away from any video production related jobs. Where are the slides? Did the guy even watch it? *Sigh*

PingPong-empg

wrt instruction decoding; you start with in parallel deciding that if location X is an instruction, how long it would be, for N bytes at a time. The cycle after you do this for N bytes further (and so on). In the second cycle, you sequentially add up the lengths to get to the limit of valid instructions and ship those instructions off to the actual decoder, with still implied links between them wrt dependencies. The step after then outputs a bitfield for the actual register changes it does so that if there's an actual dependency on a previous instruction it can be delayed.

I'm more impressed that they made this kind of stuff work with self-modifying code - which is like throwing a giant wrench into this entire machine & making it do one instruction per full pipeline flush.

dascandy

Great discussion of the tradeoffs of RAM vs cache @ ~48 mins

JohnSmith-hexg

Nice talk (though I've not seen all of it yet), branch prediction first came up for me in this relatively well known Stackoverflow Question: "Why is processing a sorted array faster than an unsorted array?"

andrewmartin

Great talk. Needs to be watched several times to grasp the vast information in there. I made it all the first time:)

frutiboy

how the hell did that ISA end up being our standard.

walterbz

you mentioned the fetcher reads 16 bytes... that would make sense to me even though the max instruction size is 15, but if you put a valid 15-byte instruction on the last 15 bytes of an executable page (the next page is PAGE_NOACCESS or non-existing), it executes nominally, ie. without causing a page fault, whereas if your process tried to read the 16 bytes it would trigger a page-fault from the 16th byte being on the invalid page. So maybe it just reads 1 byte at a time and determines if it needs to read more? Or maybe it still tries to read 16 bytes but simply doesn't trigger faults during fetch

DaveWhoa

Huh, at 31:04 it sounded like you said "most instruction sets have a decrement and jump if not zero, but x86 doesn't". I'm confused because that's exactly what loop does, it decrements CX and jumps if not zero.

eformance

i'm way out of my element here, but couldn't these chips expose some API / control mechanism for informing it about branching patterns in the compiled code that back end compiler people could target.

this only comes to mind because the task of looking for patterns in branch behavior seems like something naturally in the domain of software and not hardware.

AlexanderBollbach

I will be watching this video one minute at a time, before having a lie down, in order to avoid nosebleeds. I'm two minutes in and it seems good so far - although the bloke doing the talk seems a bit shifty

jubbernaut

x86 Internals for Fun & Profit • Matt Godbolt • GOTO 2014

x86 Internals for Fun & Profit • Matt Godbolt • GOTO 2014

How To Make A CPU

Why Linus Torvalds doesn't use Ubuntu or Debian

Mythbusters Demo GPU versus CPU

Assembly Language in 100 Seconds

It Took 53 Years for AMD to Beat Intel. Here's Why. | WSJ

A tiny x86 SBC with Raspberry Pi GPIO (Radxa X4, tested)

Conclusion - Architecture 2001: x86-64 OS Internals

PyTorch in 100 Seconds

What's an FPGA?

The Memory Sinkhole - Unleashing An X86 Design Flaw Allowing Universal Privilege Escalation

Interesting X86-64 Features (OpenVMS Boot Camp 2016 session 10090)

TempleOS in 100 Seconds

Intel x86: let's take a look at one of the most complex instruction set!

Top 10 Craziest Assembly Language Instructions

Conversational x86 ASM: Learning to Appreciate Your Compiler • Matt Godbolt • YOW! 2020

I Can Save You Money! – Raspberry Pi Alternatives

Breaking the x86 Instruction Set

“Hello, world” from scratch on a 6502 — Part 1

How the biggest war in tech started! #Shorts

Fun with machO x86-64 shellcode (Part 1)

Subaru EyeSight® - Minicar Music Player | Subaru Australia

These Chips Are Better Than CPUs (ASICs and FPGAs)

What could possibly go wrong with (insert x86 instruction here)? (33c3)