Enhancing STL containers - Viktor Korsun [ CppCon 2015 ]

preview_player
Показать описание


STL has a big history. Due to class paradigm of C++, many STL containers have a flat layout in memory, i.e. containers naturally store objects and address objects rather than their indexing entities such as smart pointers (which are, strictly speaking, objects too). This model has a benefit of lower memory fragmentation, that causes better CPU cache performance. However, many tasks in real life require reordering of objects, that requires compexity of O(n*q), where q is a size of an object and n is a compexity measured in operations. Thus, some imlementations of operations with reordering objects in some containers are much slower that they could be. This problem could sometimes be solved by storing “pointers” as index entities in containers instead of the objects by themselves, what improves performance by q times, but causes memory fragmentation and worse cache performance. Algorithmically, the solution with pointers is obviously better, but it is still not the most efficient one. In the presentation I will show myimplementations of containers using both approaches and having the best of two worlds. I will compare these methods with classical and modern approaches and draw some conclusion, encouraging everybody to use the power of algorithms with C++.

Viktor graduated from Moscow Technical University and Yandex Data Analysis School.
In 2006 - 2008 Viktor worked for Galaktika corpotation, participated in the creation of a 4'th generation OO-programming language.
In 2008 - 2010 Viktor worked for Alta-Soft LLC as a technical architect of a distributed customs database.
Since 2012 Viktor has been working for Zeptolab LLC, creating modern mobile games. At the moment the company is drastically updating the technologies and codebase, including an internal framework.
Viktor is always making an emphasis on algorithms, fundamental data structures and robust object model.




*-----*
*--*
*-----*
Рекомендации по теме
Комментарии
Автор

Thanks for a great question. Sure, the address of the "last" element is not the address of the last element in the block. At the same time, the last index points on this object. After pop_back() the last index points to the gap, and the amount of valid indexes decreases by 1. Thus, removing the last object, i.e. pop_back, is still O(1). Moreover, once another object is added into the vector, it will be placed directly into the gap, as the next available index refers there to.

bitekas
Автор

It comes down to the access pattern. In some cases, the overhead of reordering entire objects, as opposed to reordering only pointers or indexes, may result in better overall performance due to cache locality. For a given situation, one must measure to know.

jamescoulter
Автор

You basically need good allocator for your objects, and then store some_ptr's in ordinary vector. Your implementation does not separate concerns of memory managment and structuring objects, and on top of that adds complexity for end user.

IsmeGenius
Автор

However, as we do not need to maintain contagiousity, pop_back() for heterogeneous containers could be an amortized constant, with the help of the same strategy as push_back uses (that is an amortized constant as well). The gaps in this case would be eliminated once their total size equals to a constant part of a whole data.

bitekas
Автор

I agree, an object vector, containing objects with different size, would have pop_back() as O(n*W') in general case, where W' is an average size of an object.

bitekas
Автор

How do you remove objects from your vector? If you store the objects in a single block, don't you have to move the following elements to close the gap? And wouldn't that mean that `pop_back` isn't O(1) anymore since `back()` is not necessarily stored at the en of the block?

MartinVejnar
Автор

The assembly code you showed seems very inefficient. Did you forget to compile with optimizations? When I compile here, the assembly is exactly the same (except for "mov byte ptr [esp+eax+10h], al", which is turned to "mov byte ptr [esp+eax*4+14h], al") and the overaligned version takes more time (which isn't surprising, since more hardware stores have to be made). Not to mention that I had to make the array volatile so that the assignment wouldn't be optimized away completely.

MartinVejnar
Автор

But if the container is heterogeneous and each object can have different size, then you're solving the same problem the free store is solving, aren't you?

MartinVejnar