The FASTEST sorting algorithm: Part 3 - Merging runs efficiently

preview_player
Показать описание
This video from the Tim Sort series focuses on making the intermediate merge operations efficient. The Tim Sort algorithm performance is heavily dependent on merging sorted arrays efficiently.

A merge operation uses auxiliary memory equal to the size of the smaller chunk/run. This is an improvement over the standard Merge Sort approach which constructs the sorted array outside the other two arrays being merged.

Another optimisation in the algorithm is using a program stack instead of the system stack. This avoids recursive calls and allows us to choose which two runs to merge.

The final improvement is to optimise the sizes of the merging arrays. This is done by using invariants in the stack, making the stack sorted in ascending order of lengths downwards.

The final video will be on using some artificial intelligence to merge runs even more efficiently!

References:

Social links:
Рекомендации по теме
Комментарии
Автор

Part 1: 12k views
Part 2: 5k views
Part 3: 3k views

Those 3k are the ones shortlisted by Google.

pratikjain
Автор

Hi Gaurav, at 11:12 if the runs length is 2 in while condition, in the next if condition there would be ArrayOutOfBound exception..as runs[2] would be not available. A huge thanks for putting so much effort in covering this algorithm in detail.

KKV
Автор

11:09 If it meets the break criteria, isn't the while-loop won't stop?

SansWordHuang
Автор

this is absolutely genius! you, my man, is the guy!

Saurav-rlcq
Автор

umm so this is the video that no one apparently saw according to your "Why switch to NoSQL" video. lol

blasttrash
Автор

Since we know the two arrays are contiguous, why not try to merge them in place as much as possible? This could reduce the writes substantially instead of having to copy an entire array at the outset.

We could use a temporary queue instead of a third temporary array. And we only push to the queue if we know a value will have to be overwritten before we can determine it's resting place. This temporary queue would be no more than min(n, m) or n/2, but generally around the size of min(n, m)/2 or n/4 because we will probably be able to move the other n/4 without having to push them to the queue. The queue will naturally be ordered because we only push ordered elements to it, sequentially.

I wonder if there's possible improvement using such a method.

Also, we can re-use the queue to do future merges instead of having to make repeated calls to our memory manager to kill and resurrect it. The only downside is we may have to resize the queue if we do find that we need to write substantially more than n/4 elements to it (and I guess, at that point, we may as well resize it to the full size of min(n, m) to prevent further resizes). We could even add some "artificial intelligence" here as well to determine the likely appropriate size of future queues based on previous sizes.

nicholasdavidowicz
Автор

At 5:15, why would you say for arrays of size m and n, to merge we would require n/2 space? We would require min(m, n) extra space to merge m and n, right? It would not be n/2 from what I understood from the video.

vinitp
Автор

How does galloping in part 4 work with this method? Do we just use this method when the algo falls back to default merge sort?

neilteng
Автор

What if the input array is [4, 5, 1, 258, 66, 75, 12, 8, 6, 5, 4, 3, 2, 1]. The size of 3 chunks is 3, 3, 8. We are essentially breaking from the loop here while the array is still not sorted.

shrutikamboj
Автор

A question about the mergeForceCollapse-method: If I understood the mergeCollapse-method correct, then at the start of the mergeForceCollapse-method all the invariants hold, that is:
* runLen[index] > runLen[index + 1] + runLen[index + 2] for each 0 <= index < stackSize - 2
* runLen[index] > runLen[index + 1] for each 0 <= index < stackSize -1

Given this: How can it happen during mergeForceCollapse, that for n = stackSize - 2 (assume n > 0) the condition runLen[n - 1] < runLen[n + 1] might become true?
It assumes to me, that the invariants above ensure, that this might never happen. Did I miss something? A special case maybe?

reneanonym
Автор

Is it possible to do merging with constant space using two pointers and swapping?

nvjrane
Автор

Hi Gaurav ! Thanks so much. What you doing is very inspiring and you make it look like so much fun.

How do you really manage your time for this.?

thinklessdomore
Автор

We form stack when we have chunks of equal size. Therefore for stack, the condition should always be false as we moved to stack because we had same size arrays. So why do we have this condition check?

akashtyagi
Автор

Hey, great video! I'm going to try to implement this into Visual Basic, I was wondering where I can get the code in python or java so I can compare it to your explanation?

faytimen
Автор

Is it possible to do so in constant space?

SatuKing
Автор

The complexity reduction equation is : C[m] * { log(n) - x} not C[m] * {log(n-x)}

anandkulkarni
visit shbcf.ru