The Simple and Elegant Idea behind Efficient Dynamic Arrays

preview_player
Показать описание
In this video, we reveal the simple yet elegant resize scheme for efficient implementations of dynamic arrays.

Рекомендации по теме
Комментарии
Автор

you are 3b1b of programming, keep going 👍.

sainagachaitanyaporanki
Автор

Only a little improvement in your implementation:
You said that when you remove an element and the array gets less that half its max size, you shrink it.
But imagine the case that the array has 4 elements with max size of 4, and then you add, an element, remove, that element, add other element, remove, etc (common use case that acts like this is an stack).
In this case you resize your array every operation.
It is better to shrink you array in half only when it drops to 1/4 of its max size

Btw, awesome channel :)

gabriellasso
Автор

4:12 another way to think about it is to convert to base 2.
is nothing but
and if we add one, it's trivial to see that all of the digits roll over and we get Therefore, the sum must be minus 1.

phlaxyr
Автор

I've often daydreamed of making a 3blue1brown-style youtube channel but more focused on software and computer science topics .. very glad to discover someone beat me to it! lol Subscribed!

It might be nice to add third chapter on this a topic, to cover what real runtimes and libraries do, in pragmatic practice. Especially modern, garbage-collected runtimes... but even when I was writing C code 20 years ago, I remember almost never wanting to deal with copy-on-resize expanding arrays. Eg. knowing the OS virtual memory page size of 4kb, we just allocated linked lists of (4kb/sizeof(T)) slabs, and encapsulated all access to the "array" behind an interface, which could walk the list to find the right memory address quickly enough.

Sure that means lookups are technically O(n) instead of O(1) but in practice it just doesn't matter much for vast majority of cases where n < 1000 and you're not iterating over random-accesses repeatedly.

And if you do an expanding-array of slab-pointers instead of a linked list, you get better RAM locality (fewer page faults) walking through the list.. in practice these things matter! And it also lends itself well to a specialized implementation of merge-sort, and especially so, if each slab fits within the L1 cache line of your CPU.

(And if you continue to generalize that array of slab-pointers .. you end up with a thing called a Skip List, which is probably a data structure worth doing a video on.)

arithex
Автор

Wow, I subscribe just after watching the previous one. Once I watched it I directly moved to this one. I never understood arrays better than today. Thank you mate.

adnanemezrag
Автор

That's interesting, I was expecting the answer to be a linked list. This seems especially good for storing 1 or 2 byte items re space required.

alanhere
Автор

Your visual geometric summations are gold

dmnd
Автор

please keep posting consistently on data structures and algorithms

rabiaaghafoor
Автор

Got me thinking, and I came up with a solution that is always O(1) for appending new elements, but results in 3N writes, and takes 3N <= x < 6N space, whereas the one presented is 2N writes and N <= x < 2N space.
Was still a fun experiment, even if the answer was a little anticlimactic this video did get me thinking

HamStar_
Автор

To avoid memory crawling one should not grow by a factor 2 every time, but more like 1.3.

TheOneMaddin
Автор

2:06 when you said 1 million here my first thought is that virtual addressing dramatically alters this at scale above the page size. Once you exceed the page boundary items never need to be copied when the array is resized. This means you can freely add pages with O(1) overhead and the 2x is unnecessary (Though it is also unimportant as the allocated memory is also not wasted).

goldenalt
Автор

Really it was the first idea that came to my mind when I watched the last video

khengari
Автор

It is also worth noting that some programming languages support manual designation of sizes for dynamic arrays. In Swift, for example, you can call reserveCapacity(_:) method on arrays to manually set the minimum size of a dynamic array. This is favourable when you know the exact required size of the array because after the reserveCapacity call, it will always be O(1) operation when inserting elements up until you run out of the reserved capacity.

elmomertens
Автор

Imagine having to resize an array when inserting new elements.

This comment was made by the linked list gang.

jelleverest
Автор

Would like to see videos on BSTs, Topological Sort, and Minimum Spanning Trees!

rabiaaghafoor
Автор

This method is brilliant. But I did not know that the runtime of allocating memory is independent of the size of the allocation. Great!!

pedrovelazquez
Автор

I had a different idea for dynamic arrays, my idea is that you have a "segmented" system where instead of reallocating and copying the array every time you want to resize it you just take the pointer to it an add it to an auxiliary data structure that contains some more information about the array at large like how many "segments" there are, how large each one is (based on what its size was when you allocated it) etc. Like let's say for example when you create the array originally you want it to have 4 elements, so when you create it you could allocate n*2 like in the video (in this case 8) elements then take the pointer to that array and add it to the array segment table or whatever you would call it, along with the size and the current index into that segment (for adding/removing elements) which at the start would be 0 ofc. Every time you want to increase the size of the array you just get the new chunk/segment and repeat the process over and over until the array of segment information is full in which case THEN you would reallocate and collapse the array fully then start the process over again. The pros of this idea would be that resizes would be pretty fast because most of the time you dont have have to copy anything, and insertions/deletions would be pretty fast because you dont need to move around the values in the entire array, only the ones in the segment of the value you want to remove/insert, in fact you could accelerate insertion and deletion even further by adding an array of bitmasks for each segment in the information part that tell whether each slot is open or not so you can just do in place insertion and deletion and not have to move anything unless you explicitly need to in which case you could just make another function for that. Of course the main downside to this idea is it would take a lot of memory and disproportionally favors large allocations and disadvantaging small ones which doesnt exactly help. And resizing the array down would be a bit more finicky than sizing up because you would be tied to the size of the chunks you allocated originally so your best option would probably be to take the slow route and collapse it if you actually need to free up space and downsize and not just move stuff over while keeping the chunks.

Maybe this is a really bad idea but it's what I came up with 😂 its probably not even original either, I don't think it's a linked list because I'm pretty sure in a linked list each entry contains information about the next whereas in my idea the data and information are separate, but idk much about the different types of dynamic arrays so maybe it is

jlewwis
Автор

I had something other in mind to make a dinamic array: we have a chosen size that is the minimum size of the array, and another chosen size that is the size of every space added if the space isn't enough.

It starts creating the space using the formula (M+KA): M is the minimum size of the list, A is the size of the space added every time, and K is an integer required to make the length of the array to at least the number of how many variables it contains.

K can be calculated too with this formula ((L+1-M)/A): L is numbers of the variables to store.
Note: if the value of K comes out as not an integer it has to be rounded up (the +1 will be explained later)

This can be used instead of trying every value of K, since the M and A variables are chosen earlier after doing some testing.

This doesn't work by copying all the array but bigger and adding something, instead

Option 1: it has an extra space (that explains the +1 of before) that is a terminator.
Option 2: the first value stores K or the length of the array, but another extra space at the end becomes the terminator, so needeng a +2 instead of the +1 mentioned before (this to ensure that some variable stored in there is not believed by the program as being the terminator).

When the list still doesn't need more space the terminator is just a value telling the array at the end, but if the array is full and you try adding another value, the terminator becomes a pointer and points to another part of the memory, where an array of length A is saved, ready to save more variables in it.

The new array has the same properties as the first one, but now you access the data trough the first one.

The way to delete a value and add a value in the array is pretty much the same you said in the video.

The big O notation would be this:
O(1) to add a variable.
O(N) to access a variable.
O(1) or O(N) to delete a variable.
O(N) to add a variable at a specific index.
O(N) or O(2N) to delete a variable at a specific index.

All the parts I didn't specify are because I tought of them as they are explained in the video.

edit: A(the size of the space added every time) has not to be constant, like in the video, but it could be powers of 2.

elidoz
Автор

In my experience, its not so much the number of insertions that takes the most time, but rather waiting on the memory allocation function to return. even so, I like the general idea here and I ran into a similar problem when implementing my own dynamic arrays years ago. my solution was rather than doubling, to simply make the array capacity equal to the nearest power of 2 that can contain every element in the array. i.e an array containing 0 or 1 elements has a capacity of 1, an array containing 2, 3, or 4 elements has a capacity of 4, an array containing 5, 6, 7, or 8 elements has a capacity of 8, etc. This has the advantage that for large array sizes the array always takes up a whole number of pages since pages are of a size that is a power of 2

kaltwarraith
Автор

I like what you do, and you might find me needlesly picky but "on average" and "amortized time" are not the same thing. Being O(f(n)) amortized time is stronger that being O(f(n)) on average.

For example, if you start from an empty binary search tree and insert elements one by one, if the elements are chosen "at random", then each insertion will take, on average, O(log(n)) time. But this does not rule out the case in which you are really unlucky, and each insertion takes O(n) time, thus the total time is O(n²). This case is just very unlikely.

If your binary search tree had O(log(n)) amortized time, then the total running time will be O(n.log(n)) regardless of the data.

As a matter of fact, in the scheme that you describe for both insertion and deletion, every operation if O(1) on average but not in amortized time. Indeed, suppose that you have an array of size 1024 which contains 1023 elements.
- You then insert 2 elements. You thus have to create an array of size 2048 which take O(n) time.
- You then remove 2 elements. You end up only using 1023 element in an arry of size 2048. You thus create a new array of size 1024.
- Then you insert 2 elements, then you remove 2 elements...

Of course, this is very unlikely to happen but this cannot be ruled out. The operations are in O(1) time on average.

Let N be the size of the array and n the number of elements used.
If you create a bigger array of size 2N only when the array is full and you create a smaller array of size N/2 only when n<N/3, then you can show that the operation become O(1) amortized time.

fredericmazoit