CppCon 2016: Chandler Carruth “High Performance Code 201: Hybrid Data Structures'

preview_player
Показать описание


Modern programs’ performance characteristics are often dictated by their data. Whether the cache locality of data access, the size of working set, or avoiding costly memory allocation overhead. Unfortunately, the standard C++ library data structures range from adequate to terrible at controlling these aspects, and they don’t provide any of the core mechanisms needed for extremely efficient data structure design.

This talk will present the core concepts of designing high performance data structures in C++. It is based on years of experience in the LLVM compiler as well as several other large code bases. From these principles, the talk will propose a suite of data structures that provide performance without loss of generality or functionality. As much as this talk will present specific data structure designs, its primary intent will be to give an understanding of what makes these structures have greater performance than more naive approaches.

Chandler Carruth
Google
C++ Lead
San Francisco Bay Area
Chandler Carruth leads the Clang team at Google, building better diagnostics, tools, and more. Previously, he worked on several pieces of Google’s distributed build system. He makes guest appearances helping to maintain a few core C++ libraries across Google’s codebase, and is active in the LLVM and Clang open source communities. He received his M.S. and B.S. in Computer Science from Wake Forest University, but disavows all knowledge of the contents of his Master’s thesis. He is regularly found drinking Cherry Coke Zero in the daytime and pontificating over a single malt scotch in the evening.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

Chandler Carruth talks are great. I try every one of them available on the YT.

kamilziemian
Автор

man, Chandler makes the camera man work really hard for his money xD

MarcusAseth
Автор

Chandler is my favorite speaker after Herb, good talk!

kewqie
Автор

I always enjoy Chandler's talks - this one has great video/audio quality as well.

I would like to emphasize the fact that this pointer-packing stuff is implementation-defined, and generally pointless outside of a performance bottleneck. So, some of the code you don't see in the slides likely contains a lot of platform-specific configuration macros (but I haven't looked at these parts of the Clang code base yet).

TL;DW: Cater to the cache, pack your pointers.

barrettellisadair
Автор

@20:00 Microsoft has very similar implementation in their ATL, since 2003. Their implementation is called CAtlPlex, and is used internally by many ATL containers.

soonts
Автор

As usual, it is a joy to listen to Chandler's talks. I am working on an Open Source library of data structures for High Performance Computing and one of the first thing I had to do is to rebuilt a custom std::vector. One of the many reason I had to do that is that std::vector<double> v(n); fills the vector with zeros which trashes the cache and prevents you to use the "first touch policy" which is extremely important for NUMA architectures such as dual socket nodes. Alignement is also an issue to get efficient vectorization in HPC. These things could be "solved" with allocators, but I just hate them as I don't want them to change the type of my containers.

I have a few questions though:
- Do you have a llvm::Vector<T> or do you use llvm::SmallVector<T, 0> as a replacement for std::vector<T>?
Do you also use llvm::Vector<T, n> as a replacement of std::array<T, n>?
- There is a BumpPtrAllocator, but is does not seem possible to use it with SmallVector. Do you have any design ideas for allocators? I just can't get them to work the way I way. I hate both the way they are done in the C++ standard library and the Bloomberg implementation.

insideloop
Автор

Is there a significant performance benefit to use the hybrid 32-bit pointers and 64-bit registers ABI, namely x32 ABI? I don't mean for the generated code, but specifically for the compiler itself?

davidforgeas
Автор

I'm quite sure that the paramaterized allocators problem is easily solvable with C++20's `concept` feature now.

Cons-Cat
Автор

The smallvector slide at 5:26 is confusing me because of omissions: 1) Can all small size vectors be upgraded to heap storage on overflow 2) Is SmallVectorImpl what you use for function params? I assume so, it just sounds like a wierd / wordier name.

spudtaters
Автор

6:17 Slide 6 -- How does the SmallVectorImpl::push_back know whether to free the old buffer? Without knowing the original N, it can't tell if it's been reallocated to the heap already, or was still using the small buffer.

JohnDlugosz
Автор

12:00 Can alias templates be used to avoid those mistakes?

enhex
Автор

Hi Chandler,
are you implying that modern CPUs do indirect memory prefetching?

e.g. are you implying that CPUs will work out you are iterating through an array of pointers, read one or more indices ahead in that array, and then prefetch the memory pointed to by that pointer?

onosendaiAAB
Автор

ok i see so the idea behind smallvector is that you can allocate some space before hand? instead of allocating when pushing something on to the vector array ?

Shockszzbyyous
Автор

Bug in slide 21: `(Size >= (End - CurPtr))` should be `(Size > (End - CurPtr))`.

maxxba
Автор

I don't get the point about SmallVector class having a fixed size, whereas the containers goal is to change it. I can only guess that 'N' is the maximum size the "simulated container" will ever get, right? But how would you know that?

MrAbrazildo
Автор

_Value.template_, I don't even know what this means.

MrAbrazildo
Автор

> 5:00, in derived class, you could had use the base constructor instead (C++11 and above): _using
> 8:36, at least for GCC, _std::pair_ drops performance by the half, compared to 2 separately values. It was shocking to me (absurd geek moment)! However, I guess that an std::array <T, 2>, and a first and second f()s, fix this issue.
_Edit: nope. This array is as slow as std::pair._

MrAbrazildo
Автор

i just got lost on xkcd.com for 10 minutes

jonesconrad
Автор

Contrary to what Chandler says, bit packing is slow. You won't find it in use very often in the modern code. Bit packing makes sense only if you save substantial amount of data size (megabytes).

alexeiz