Why is Radix Sort so Fast? Part 3 Comparison and Code, Radix Sort vs QuickSort

preview_player
Показать описание

In this 3 part series, we will explore sorting algorithms from the fundamentals all the up to implementations of both a comparison sort and a base 256 Radix Sort.

In this 3rd and final part, we look at some code and compare the performance of RadixSort and QuickSort with lists of various sizes, consisting or randomly generated unsigned 32 bit integers.

Apologies for the code below, I have had to replace all greater or equal symbols with GE, and all less or equal with LE, greater is G and less is L. For the unedited version, please refer to the video!

Code:
// Radix sort based on Geeks for Geeks:
static void RadixSort256(unsigned int* arr, int n)
{
if (n LE 1) return; // Added base case

unsigned int* output = new unsigned int[n]; // output array
int* count = new int[256];
unsigned int* originalArr = arr; // So we know which was input

for (int shift = 0, s = 0; shift L 4; shift++, s += 8)
{
// Zero the counts
for (int i = 0; i L 256; i++)
count[i] = 0;

// Store count of occurrences in count[]
for (int i = 0; i L n; i++)
count[(arr[i] GG s)&0xff]++;

// Change count[i] so that count[i] now contains
// actual position of this digit in output[]
for (int i = 1; i L 256; i++)
count[i] += count[i - 1];

// Build the output array
for (int i = n - 1; i GE 0; i--)
{
// precalculate the offset as it's a few instructions
int idx = (arr[i] GG s) & 0xff;

// Subtract from the count and store the value
output[--count[idx]] = arr[i];
}

// Copy the output array to input[], so that input[]
// is sorted according to current digit

// We can just swap the pointers
unsigned int* tmp = arr;
arr = output;
output = tmp;
}

// If we switched pointers an odd number of times,
// make sure we copy before returning
if (originalArr == output)
{
unsigned int* tmp = arr;
arr = output;
output = tmp;

for (int i = 0; i L n; i++)
arr[i] = output[i];
}

delete[] output;
delete[] count;
}

Quicksort:

int Partition(unsigned int* data, int lo, int hi)
{
unsigned int pivot = data[lo + (hi - lo) / 2];
int i = lo - 1;
int j = hi + 1;

for (;;)
{
do {} while (data[++i] L pivot);
do {} while (data[--j] G pivot);

if (i GE j)
return j;

// Swap [i] and [j]
unsigned int tmp = data[i];
data[i] = data[j];
data[j] = tmp;
}
}

void QuickSort(unsigned int* data, int lo, int hi)
{
if (lo L hi)
{
int p = Partition(data, lo, hi);
QuickSort(data, lo, p);
QuickSort(data, p + 1, hi);
}
}

void QuickSort(unsigned int* data, int count)
{
if (count LE 1) return; // Added base case

QuickSort(data, 0, count - 1);
}

Software used to make this vid:

Рекомендации по теме
Комментарии
Автор

For when you can't use radix sort, there's a whole science to picking the pivot for quicksort. For large arrays picking pivots close to the median gets you closer to O(n log n) and helps to avoid worst case (n2). Picking a single random value from the array is somewhere between those (since you're unlikely to pick the median). To pick a good pivot, pick several random values from the array and use the median of those. The science is how many random values to use as a function of n.

BradleyBroom
Автор

is the way to go when you want to measure performance of anything in C++

captainbodyshot
Автор

As if I needed any more distractions at the moment (it's hard enough to finish existing projects), now I want to write some sorting algorithms!

TomStorey
Автор

The weakness of Quicksort is that it cannot take advantage of partially sorted data or data that's exactly or almost exactly reversed. In real life, you often have two or more sorted dataset that you want to combine into a single sorted whole, or dataset that's mostly sorted but with just a few items in incorrect places. Indeed the worst case scenarios for quicksort usually is with sorted or reversed dataset.

Some hybrid sorting algorithms like timsort or introsort can take advantage of those existing structures in the data and perform a nearly O(n) sort, while avoiding worst case scenarios and maintaining O(n log n) for any input.

The standard sorting algorithm in the standard libraries of most programming languages usually uses one of these hybrid sorts rather than a naive sorting algorithm like quicksort.

yvrelna
Автор

use this if you often need to toggle between 2 lines

//* remove first / to toggle
func1();
/*/
func2();
//*/

CompanionCube
Автор

I was always scared of radix sort, thanks to your three part series I feel much much better, THANK YOU

GabrielPerez-mlgo
Автор

Amazing stuff. Love your work. I started on 8 bit machines back in 1978. You guys trying to learn, subscribe to this guy and watch all his stuff. Now Mr Creel, you been doing assembler. So you could use the processors clock counter to measure how long things take in cpu clocks. Forget the libraries. The magic x86 instruction is RDTSC - Read Time-Stamp Counter into EDX:EAX, opcode 0F 31.

sheridenboord
Автор

2 small optimisations available for your RadixSort. memset and memcpy are a tick faster than loops, but the more important optimisation is that you don't need to shift and mask the input values. You can use a byte * instead of the uint * and you increment it by 4 in the innerloop and by 1 in the outerloop (of course endianness becomes an issue but in practice not, everything is little endian nowadays).

galier
Автор

Wow. I never knew there was a sort that was so much faster than Quick Sort. I learned something here today. Radix Sort is nothing short of amazing!

xniyana
Автор

great series! very informative.
couple of things I might improve, in order of importance IMO:
1) keep GenerateRandomData() call out of timing block. you don't need to measure random number generation
2) use std::chrono::steady_clock for better timing (milliseconds is enough but you can go up to nanoseconds precision). important point here is to use double instead of default int64_t:
using milliseconds = std::chrono::duration<double, std::milli>;
using std::chrono::steady_clock;
auto start = steady_clock::now();
// stuff
auto end = steady_clock::now();
double elapsed_ms = - start).count();
3) use std::vector (when possible: std::array) instead of C style arrays for two main reasons: a) no need to new/delete, b) swapping input and output arrays becomes as simple as std::swap(input, output) and it is O(1)
there are more small stuff but hey it's a rabbit hole :)

NuriTANRIKUT
Автор

Great video series. Really enjoyed it. Altough I already learned about sorting algorithms at university I haven't taken a look at RadixSort until this series.

mivoe
Автор

Thanks for bringing this topic up! I didn't study these sorts at university. I intend to use both the bucket-sort and the radix-sort to sort trees and graphs. These algorithm have wonderful extensions to more complex objects than numbers and the number of variations on these opens up a lot of possibilities!

cptechno
Автор

Although it looked like radix performed better at all sizes, I think it’s important to clarify that if each of these sorts was done 1000 times for each given size, quick sort would indeed perform much better on the smaller sizes.

ridlr
Автор

This is my first time watching some of your videos and this is great! Excellently explained.

goranjosic
Автор

this was really an interesting adventure into the radix sorting alg, good job there mate

morpheus_uat
Автор

This was a terrific and engaging mini series. It gave me flashbacks of computer science at uni. Thank you 👍😎🇦🇺

johnvodopija
Автор

I have an article on github that covers sorting other numeric types and some radix sort optimizations, e.g column skipping. It's called "Radix sorting from the ground up"

cacheman
Автор

Awesome stuff! Very informative series that demonstrates why the methods do become faster, and not just how to do the method itself. Thank you very much, I greatly appreciated this.

ferinzz
Автор

Am I right in thinking that the number of digits in the numbers affects radix sort but not quicksort? If so, I would have liked to see a set of tests where the number of elements are constant but the number of digits of the elements increases, and seeing if what effect that would have.

roccov
Автор

Fascinating series on sorting. it's been 3 decades since I messed with this stuff, and even then, we didn't get this deep. Thank you for taking the time to explain this.


Side note, has anyone told you that you sound very much like Eric Idle from the Monty Python crew? I don't mean that as defamatory.

kleetus
join shbcf.ru