CppCon 2014: Chandler Carruth 'Efficiency with Algorithms, Performance with Data Structures'

preview_player
Показать описание

--
Why do you write C++ code? There is a good chance it is in part because of concerns about the performance of your software. Whether they stem from needing to run on every smaller mobile devices, squeezing the last few effects into video game, or because every watt of power in your data center costs too much, C++ programmers throughout the industry have an insatiable desire for writing high performance code.

Unfortunately, even with C++, this can be really challenging. Over the past twenty years processors, memory, software libraries, and even compilers have radically changed what makes C++ code fast. Even measuring the performance of your code can be a daunting task. This talk will dig into how modern processors work, what makes them fast, and how to exploit them effectively with modern C++ code. It will teach you how modern C++ optimizers see your code today, and how that is likely to change in the coming years. It will teach you how to reason better about the performance of your code, and how to write your code so that it performs better. You will even learn some tricks about how to measure the performance of your code.
--
Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.
--

*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

My take away: cache locality is everything.

coderxff
Автор

I watch nearly all talks on cppcon + meeting. This is my favourite subject. This talk encapsulates 70% of what it is to write efficient code. to cover 70% is incredible for an hour. Best talk to date on the subject.

fritsgerms
Автор

26:16 Very good chandler,
"Why do I bring up these simple examples, I bring up these simple examples because it is critical that you always do these things, ok. Not some of the time, not after a profile discovers a problem, you wont find a profile that points to these problems, that's part of what makes them really tricky, ok. The key here is that you want to always do these things because they don't actually make your code worse."

Essentially we're talking about efficiency death by a thousand performance hits, microoptimiziation zealotry called to battle, I like it.

cosmicallyderived
Автор

Excellent talk and even more accessible than Mike Acton's Data-Driven design talk (also good) without being so low-level oriented.

soulstudiosmusic
Автор

Thanks! This talk is still very useful in 2023, though it could be 2 times shorter. The gist is: be cache-friendly, keep your data locality high, use contiguous containers. Instead of std::map you can often use std::vector + housekeeping, and sort when necessary. Lists are evil. Earlier, I saw that, in synthetic examples, when not much else is going on, the performance of std::vector and std::map is essentially the same when the sizes of the containers are below ~500. The tricky part here is the "when not much is going on". So I decided to carry out a real life experiment. Recently I wrote a lightweight node profiler for a tree of CPU-heavy nodes of a size ~400. I used maps to sort measurements by two distinct properties. It was ok, but in debug builds the performance hit due to profiling was noticeable, similar to 1-2 heavy nodes. I spent 2 hours rewriting the code to std:vector plus some additional housekeeping.The gathering and sorting of data in debug builds became ~3.5x faster, and filtering of data became ~8-10 times faster. Surprizingly, the code became 1.5x smaller. Now it is entirely possible that the first version of my code was simply shit. But anyway, now my profiler is really a lightweight one.

eyedl
Автор

"Performant" is very much a valid word in marketing pidgin.

sparkloweb
Автор

I love the table of memory latency at 37:15

echosystemd
Автор

No, he didn't say to simply replace map with vector. He said that a sorted vector (where find() would become a binary search) has the same functionality as map but much higher performance.

morbo
Автор

There are a few funny points about this:

Caches - at the time of the talk intel Haswell was already released and they had one of the best experimental changes Intel did in that long 4-core time: They added an L4 cache. For some applications (my work included) that was a huge speedup (~20% in many instances). Sadly they did not further pursue that.

Java - i mostly had the opposite experience: People claiming it to be oh so slow. Kinda funny when you can then show them that it is one of the most used languages for programs running on supercomputers. Sure, Java was extremely slow in the first version where is was basically an interpreted language. And even nowadays there are scenarios were it is crawling along like a sloth but that is in specific scenarios only as it has a rather long startup-time for the JVM so starting a small program thousands of times from commandline-arguments will be really slow.
On the other hand writing java is easy and when putting the same amount of work into writing a program in C++ and Java then the Java-Version is very likely to perform better as it tends to be easier to write java and harder to write high performance C++.

And performance:
Well yeah, sadly programs are objectively becoming less and less efficient and slower at completing the same objectives - generally speaking.
More time and resources is focused on other aspects and performance is treated like "Speed? We got faster hardware" very often. Specially with smaller games that is very noticeable. When 2D side-scrollers with maybe a few docent objects on screen will not run smoothly on a 10 year old CPU while looking worse than the games that came out 15 years ago....
I have seen extremes like Sim-City clones that have less functionality and less going on would fully utilise 1 core of a current CPU and still only compute about 10 timesteps a second.

And at my work it is also rather strange.
Yes, i do use std::unordered_map for a few things as it made the algorithm easier. Funnily enough i initially used a normal std::map and during code-review it was mandated that i'd switch to unordered-map for better just that the change gave me exactly 0 improvement in execution-time cause for our software most of the time is wasted on an incredible archaic and inefficient tracing-tool. Changing the format in which i traced the data (constructing and formatting the string on my own) gave me a 3x speedup.
The whole base-system has so many problems that the execution-time of the actual businesslogic is not even 1/10th the total.

Best thing ever was the request to improve the interface i had written for communicating with a particular hardware-system.
It was a block of ~2500 bytes, with the first couple bytes being some base-information that was always required and some control-info telling us which fields were filled. Reading everything in one go took about 1 second. The initial request was for me to read those few marking-bytes and then only read the individual fields that were actually useful - and that increased the time to over 2 seconds cause that hardware only polled incoming requests like 50 times a second. Changing it to read it opportunistic with a large initial block reduced it to 500 ms on average.
And of those 500 ms i checked that my code was only running for ~10 ms, 9ms were for the slow tracing, <1ms was the actual code, aacknowledged

ABaumstumpf
Автор

My definition of "performant" would be reaching the point at which continuing to improve the performance of a program does not justify the cost it would require to improve it.

jacob_s
Автор

The real sentence is "Rush to idle", Copyright Mooly Eden ... ;)

FougaFrancois
Автор

It is pity that too much constraints in the C++ specification forces std libraries to specific implementations. It seems that we miss already out out on sbo vector (like Boost's small_vector) and a hash map with faster lookup. Perhaps that one also solves the creation cost of unordered containers which can be pretty heavy due to allocations.

gast
Автор

24:39. That code is NOT linear. Yes, you hash the key 4 times, but you always hash it 4 times regardless of how many entries are in the cache. If it was linear you would do more work with bigger caches. That code may not be efficient, but it is still O(1)

paulpach
Автор

Very interesting but I'm confused... why don't the CPP committee give us a range of useful vector-based data structures that we know will be faster in practice (given some bounds)?

robbie_
Автор

he was talking about sorted vectors and in that case you can use std::lower_bound which will be faster then map::find

pecholt
Автор

i dislike camera work where they jump away from the diagrams, like the cpu one, cause I like to look longer and study (obviously I can pause though, so its kinda moot anyhow)

Kavukamari
Автор

The comparison of java and c++ prformance he does is good but i think the problem is his conclusion that it is hard to get java to perfoem better than c++ but in reality it is a bit difficult to make a c++ program run in an optimized way. So if you are not super skilld it is a bit of a dice roll. So if you simplify the situation you could say C++ is like rolling a die with the numbers 1-20 on it and using java is like rolling a die with the numbers.. lets say 5-15 because higher amount of control also means higher risk of messing up.

_TheDudeAbides_
Автор

The compiler can do return-value optimization if you return by value. But a move, which is not exactly free, cannot be elided.

auffiechumi
Автор

Although I agree on the cache locality affect on performance, I don't really buy the argument using vectors against map and lists. The problem is that you normally need to traverse (almost) random locations in the memory in order to do anything with the objects you reference in your collection, unless these objects are actually contained in the vector itself. In every other case that you put (smart or otherwise) pointers in the vector, and you want for example to locate an object (e.g. with binary search) then cache misses will be frequent.

sgsfak
Автор

The compulsion to return return a vector by value is very suspect. Returning by value in this case forces the user to always re-allocate a new vector, even though it's often cheaper to re-use a vector you already had, especially if the new vector will have the same size or be smaller than the old vector. But return by value prevents that, because RVO doesn't work for re-assignment.

sirhenrystalwart