CppCon 2017: Chandler Carruth “Going Nowhere Faster”

preview_player
Показать описание


You care about the performance of your C++ code. You have followed basic patterns to make your C++ code efficient. You profiled your application or server and used the appropriate algorithms to minimize how much work is done and the appropriate data structures to make it fast. You even have reliable benchmarks to cover the most critical and important parts of the system for performance. But you're profiling the benchmark and need to squeeze even more performance out of it... What next?

This talk dives into the performance and optimization concerns of the important, performance critical loops in your program. How do modern CPUs execute these loops, and what influences their performance? What can you do to make them faster? How can you leverage the C++ compiler to do this while keeping the code maintainable and clean? What optimization techniques do modern compilers make available to you? We'll cover all of this and more, with piles of code, examples, and even live demo.

While the talk will focus somewhat on x86 processors and the LLVM compiler, but everything will be broadly applicable and basic mappings for other processors and toolchains will be discussed throughout. However, be prepared for a lot of C++ code and assembly.

Chandler Carruth: Google, Software Engineer

Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

For the last question: the CPU store buffer and register renaming makes it possible to hide results of operations from external observers (i.e. other cores or memory for instance). The changes are only made visible when they will be program-correct (i.e. the CPU will actually execute and go past the array bounds, but before actually "publishing" the changes it will check if the branch prediction was correct and discard the incorrect results, that's why anything works)

MrKatoriz
Автор

If only he dug into that last question, we might have known about Spectre that much earlier.

eauxpie
Автор

I have no idea about assembly language or the further details but still feel the passion of the lecturer and the audience. I am satisfied and cheer like everyone else did at 35:10 during my lunch break. Good job!

jackwang
Автор

In his 2018 talk he discusses the last question in great detail

osere
Автор

Chandler helps me realize that I know nothing about benchmark.

echosystemd
Автор

I am a simple man, I see a good talk, I press like.

hanneshauptmann
Автор

20:48

Going from .01 to .04 is a 300% increase in cache misses (e.g. 4x the amount), not .03%. When you look at it that way, dramatic changes in performance aren't that surprising.

Zeturic
Автор

That command prompt is just pure madness!

ldmnyblzs
Автор

cmov is used for cases where the condition is close to random 50/50, since branching performs absolutely horrible there. In the presentation, random numbers generated for the test are in range 0-intmax (roughly 2 billion), while clamp limit is at 255, so the probability of mispredict is ~1.18e-7, which is why cmov is slower in the example.

MrKatoriz
Автор

How much I want to have 1/1000 of Carruth's knowledge about compilers.

kamilziemian
Автор

I'm not a C++ developer (I have a background in C), and I don't really know x86, but these talks by Chandler Carruth are so interesting to me. This is like crack! :D

emanuellandeholm
Автор

The reason for tight-loop speedup is because the branch is never missed after the first execution, so it is predicted super well and a memory store is just avoided at all.

ZeroPlayerGame
Автор

In the clamp example, after the first iteration, all values will already be clamped. I guess that's why branch prediction will always be right after some point.

christophrcr
Автор

Thats what i love about chandler's talks, he goes to the nub of the topics, and hard ones at that, rather than just glossing over.

abc
Автор

10:00 Ohmygosh I'm using the same nvim setup, shell and even colorscheme as Chandler Carruth, I will never change my ~/.config again!

tobiasfuchs
Автор

"I don't even know. Magic and unicorns?"


Great talk!

bluecircle
Автор

The most important thing you can get out of this talk is that Magic and Unicorns keeps the processor from crashing! :D

JackAdrianZappa
Автор

TSO doesn't allow reordering anything past a store. Therefore a regular load won't flush the store buffer because it's being reordered ahead of buffered writes, not past them.

AlexeySalmin
Автор

6:08 did anything ever come of the Efficiency Sanitizer?

warrenhenning
Автор

So what if he used $255 instead of %ebx on the cmove too??

NbyEdge