In search of the perfect dynamic array growth factor

preview_player
Показать описание
This video has a page on 0DE5 with exercises and resources

Chapters
00:00 - Intro
03:29 - Golden Ratio
05:11 - 1.5 Factor
07:08 - Lankinen Algorithm
09:53 - Larger simulations
14:18 - Origins
17:36 - Exercise
Рекомендации по теме
Комментарии
Автор

You overlooked a key point here in that if there's enough space to account for the resized array, nearly all allocators, at least the good ones, will merely alter the recorded size of the allocation and return the same pointer to you. In situations like that, you don't need to copy the elements of the array at all because the array was resized in place. In situations like this, where you have the space to keep allocating in place, a growth factor of 2 will be significantly better. If the allocator places chunks far enough apart, then multiple array allocations will work in that more efficient manner.

anon_y_mousse
Автор

Real-world applications and memory allocators are actually much more complex (and smarter) than the one given in the video, so that some claims are less applicable in real life.

First off, as mentioned in the video, real-world applications tend to have many arrays allocating together. With a free-list allocator like that in the video (e.g. glibc malloc), different reallocated arrays will fill in the gaps created during resizes, so the total memory usage will not be as bad as the initial examples.

Additionally, many modern allocators actively control and reduce memory fragmentation (unused gaps between allocations).

Some allocators (like jemalloc and mimalloc) have the notion of a "size class", allocating memory of similar sizes together (wasting a little bit of memory for padding) and different ones in completely different places. One page can be dedicated to allocate memory chunks of 481 to 512 bytes, and another dedicated for 513 to 640 bytes, and so on. Arrays of different sizes won't meet at all because they belong to different size classes, and the empty space within one page will be filled with allocation of other arrays or data structures taking similar sizes (because there's too many of them).

Other allocators, while not having size classes, force memory allocations to exhibit specific patterns to reduce fragmentation, in the cost of slightly over-allocating. Buddy allocation is a famous trick first seen in the 1960s to only allocate memory within the page in pre-divided partitions, still used today in e.g. jemalloc. Other allocators like TLSF tries to allocate memory into their best fit to reduce fragmentation within constrained memory and time limits.

For large-enough chunks of memory allocated, for example those spanning multiple OS pages, a realloc() might be as simple as remap of the pages occupied onto a larger contiguous memory address. No data is copied at all, the only change happens in the virtual address mapping tables that changes which physical memory chunk the virtual addresses point to.

Finally, for garbage-collected languages/allocators, many of the garbage collectors (like those used in V8 JavaScript, JVM, CoreCLR C#, and many more) actively do memory compactions when collecting garbage, so memory fragmentation is not a problem for them at all.

Not to deny the work done by the author -- It's a very good way to visualize and compare memory allocations. It's just that I happened to read through some of these topics, and notice that some claims are oversimplifications :)

In the end, if you're designing a dynamic array yourself, you might as well just pick 1.5 or 2 based on your expectations of how frequent the user expands it and how much memory you would like to waste, and just let the allocator handle the rest. If you're still in doubt -- just pick 2. It's good enough!


References

ryncomaekawa
Автор

Programs can also exploit modern architecture's implementation of virtual address translation, which is blazingly fast - done in hardware. Therefore you don't need to copy data around. Because the virtual address space is always very large (128 to 256 TiB), you can reserve large enough areas of virtual memory for each array - even more than total system RAM - and delay commiting pages to physical memory until they are needed, as the array grows. This approach has the added benefit that raw pointers will always remain valid after the array is expanded. The disadvantages of using this approach right now are basically none, AFAIK, but in the future it may run into scalability issues as programs grow to consume ever larger amounts of RAM.

fernandogiongo
Автор

Videos like this are so inspiring and I literally got into a mini rabbit hole about it. I was just thinking about it for like 2-3 hours and tried calculating stuff. In the end I accomplished nothing, but it never felt like wasted time. It was truly fun!

andrijor
Автор

I adore being a Rustacean, because for any given problem, where one can fall into a rabbit hole like this, I can trust that someone in the Rust team has explored the hole and come back with a solution I'd be hard pressed to surpass.

-Kirby-
Автор

I don't know if this was the original intention, but I interpreted Lankinen's comment as a way to do more copying but keeping memory totally compact. In other words, if the original allocation is N units and stored starting at position 0, after the two copying steps we have an allocation of size 2N also starting at position 0. Kind of hard to show the diagram in a youtube comment. Starts with allocation of N located at position 0. Then you allocate 2N starting at N. You copy the first elements into the *second half* of the new 2N allocation. Now you have empty space for 2N units starting at position 0. Copy the N units starting at 2N into the N positions starting at 0. Now you have a contiguous 2N vector starting at position 0. And you have free space to grow starting at position 2N.

NathanWhitehead
Автор

"memory efficiency" is only really applicable on architecture that does not use virtual memory. Factor 2 maximizes amortized cost. The actual memory mapping is less important. As long as the memory allocator doesn't fragment over time then you will always get the block you need. Which is why i think factor 2 is picked by the high performance compilers, it exploits the one property that matters by the computers we actually use, i.e minimize bandwidth use, anything else is managed by the hardware and a non-problem.

Ptolemusa
Автор

Powers of two reign supreme, but in situations where memory efficiency is more of a concern, I do something slightly different that I haven't seen anywhere else.

Arrays are expanded up to the next "half" power of two. A "half" power of two I define to be either a power of two, or the value halfway between two powers of two. An example "half" power of two is 12, since it's halfway between 8 and 16. You can also think of this as the sum of the previous two powers of two (4 + 8 = 12).

So an example sequence of growing dynamic array sizes starting at 8 would be: 8, 12, 16, 24, 32, 48, 64, 96, 128, 192, 256, ...

You can also think of this as alternating between a growth factor of 1.5 (e.g. from 8 to 12) and 1 1/3 (e.g. 12 to 16).

Some advantages of this method:

1. The binary representation stays simple/nice for computers to work with. It has just one bit set if it's a power of two (0b10...0), or two consecutive bits set if it's a half power of two (0b110...0). This allows for certain optimizations.

2. When the size is not a whole power of two, the "gap" to the next power of two is always itself a power of two. E.g. if the size is 12, then the "gap" to 16 is 4. This can greatly simplify handling memory fragmentation in certain cases.

Daskie
Автор

When I saw the title of this video I immediately thought "Golden ratio!". Glad to hear my intuition is right😂

boas_
Автор

FYI, modern memory allocators have both the allocate/deallocate (e.g. malloc/free) operations, but also a reallocate operation (e.g. realloc). If the memory allocator knows that the realloc can grow in place, it will do that, avoiding the copy. So in real life, there's more going on that will alter what is a good choice of growth factor, and also what is better for larger arrays that are more likely to need a full reallocation/copy pass over a grow-in-place. This is made even more messy if you are using huge pages or some totally owned arena, for example, from an mmap, so that you always can grow in place. And as you noted, if many different things are being allocated into the same arena, there are other factors in play. Lastly, if you truly can't avoid an endlessly growing array-like structure, you almost certainly want a linked list or binary tree of arrays. If you store the pointers in a separate array of pointers from the array of actual values, you can recover very good performance even with insertions and deletions. Anyway, I think I'm teaching you to suck eggs, so I'll shut up now.

mrpocock
Автор

this deserve more likes. Not just conclusions but also visualization with real time statistics(+ graph) gives real good indication on the tendency.

jupiterbjy
Автор

Who would have ever expected that the Golden Ratio contains a hidden off-by-one error!

andrewdunbar
Автор

I've have wondered about this for so long, I have always been a person who is scared of doing the "unoptimal" thing especially decided how to pick how much should a dynamic array should grow. This is a godsend for me thx

MatthewElento-vvsb
Автор

16:10 that's one of the best single lines I've heard in a while

alexhiatt
Автор

I hope you don't ever stop making videos. They're so good!

mo
Автор

This is going to blow up. Fascinating exploration!

FocusAccount-ivxe
Автор

One thing to remember is that not all data types occupy the same size. So a variation of the growth factor 2 is to round up the required space in bytes to the next power of two. It's not quite a factor of 2 if the data size is weird, but it has the effect that arrays of any type can re-use the "memory fragments" of other arrays. Not sure if any language does that.

If you know you will need more arrays (and are on a single thread), you dont even have to de-allocate, just put the chunk in a list to re-use. I am using this in a wasm project as the default allocator seems to be really really bad for memory fragmentation.

Marius
Автор

Due to how memory and allocations actually work there is little to be gained from supposed "memory efficiency" unless you are running rather large applications that can get close to exhausting virtual addressing. And that again is rather unlikely as on normal consumer machines you can't even get 10TB of memory while the VAS is over 10 times larger. So a factor 2 just doesn't really have any particularly big drawbacks.

When talking performance you also just want to minimise the number of allocations and if possible get nice cache alignment. If you got a rough ballpark for the expected elements than pre-allocate that size. So if you expect say something like 5-10 elements then just make it 15 and it will nearly never need to reallocate.

ABaumstumpf
Автор

Aside from your technical prowess you seem like a very nice person, keep on!

eliancordoba
Автор

I remember reading a few old articles about optimizing Java programs, and most of these articles mentioned that one of the main reasons for big memory consumption is allocating a lot of arrays and maps that have only a few elements or being completely empty (I never researched it, but I believe that might be the reason why modern Java doesn't actually allocate memory for arrays until you try to insert the first element).

And that got me thinking: what if we try to calculate new capacity dynamically? A few ideas:
* Start with the growth factor of 2 and decrease it by 0.1 for each new allocation until reaching 1.5
* The same strategy as above, but with a delay until a certain amount of elements
* Start with the growth factor of 2 and switch to 1.5 after a certain limit
* Alternate between 2 and 1.5

My reasoning is: surely most of the arrays and maps in the real world have very few elements. By using different growth factors we can prioritize the speed for small arrays and memory usage for large arrays.
And if someone needs very large arrays (say a few million elements), they probably already allocate them with enough capacity so the small growth factor wouldn't be an issue in most cases.

It would be interesting to test this theory on a real-life data like Rust does

DamnedVik
join shbcf.ru