Find the smallest without comparing(Radix sort!)

preview_player
Показать описание
A simple question that leads you into discovery of a startlingly fast, intuitive, and simple algorithm. What if you had to find the smallest in a list of numbers, but you could never compare any two of them? How would you know the smallest without knowing which is smaller?

This video explains how to solve this problem and gives a gentle introduction of the radix sort algorithm. It's intended for computer science students and newer programmers, but more experienced professional engineers and developers may also find these sorting animations and visualizations enjoyable.

Radix sort is a non-comparison sort and one of the fastest sorting algorithms. This video covers most significant digit radix sort (MSD Radix sort) and also mentions least significant digit radix sort and parallelized radix sort. It could also be used as a short review for anyone doing technical interview prep, or for someone who has taken a data structures and algorithms/DSA course and needs a quick refresher.

Links to other resources:
Proof of time complexity (MIT lecture, see towards the end for the proof):

Handling negatives and some suggestions for optimization:

Research papers on running parallel radix sort in-place (saving memory):

An optimized radix sort in C (400+ lines):

Some research on comparative speed:

(scroll to bottom for graphs and English)

My radix sort in Python code -- see the comment, it won't fit here!
Рекомендации по теме
Комментарии
Автор

Here is some Python code that goes with the video. It wouldn't fit into the show description.

from itertools import chain, product
import random
from math import log, ceil
from collections import deque
from copy import deepcopy

"""
The sorting functions below go with the video, and are meant to provide
accessible examples. They are not optimized for performance. Also, the
simple function breaks for cases with negative numbers, as mentioned
in the video.

Please comment if you have questions or suggestions!
"""

#NOTE: Doesn't work for negatives or shorter numbers. See the video and the completed, 'fixed' version below
def msd_radix_sort_simple(nums: list[int], radix: int = 10):
if not nums:
return nums
num_digits = get_num_digits(nums, radix)
# use a deque/queue to keep buckets in order rather than using recursion
q = deque()
# So we're going to append a 3-tuple of starting index, ending index,
# and which_digit we are currently bucketing on.
# [start, end) for the range of values in nums to process on this pass
# which_digit for which digit to do (0 is ones place, then counting up)
q.append((0, len(nums), num_digits-1))
while q:
start, end, which_digit = q.popleft()
if which_digit < 0 or end - start <= 1:
continue
# Put into buckets, which start as empty lists. Shouldn't
# need more than O(n) additional memory.
# You can use O(1) additional memory by pre-counting and
# then just swapping in the original array, but that's not
# stable.
buckets = [[] for i in range(radix)]
for num in nums[start:end]:
which_bucket = get_digit_at(num, which_digit)

next_round_start = start
# generate indexes to put into queue, so that I know where
# to start and end on next round/digit
for bucket in buckets:
next_round_end = next_round_start + len(bucket)
q.append((next_round_start, next_round_end, which_digit-1))
next_round_start = next_round_end
# flatten buckets and copy back into original space (can combine with above loop?)
for i, num in
nums[start + i] = num
return nums



def list[int], radix: int = 10):
if not nums:
return nums
num_digits = get_num_digits(nums, radix)
q = deque() # using a queue instead of recursion
# Split pos/neg into two buckets. You could also do this step last,
# but that's not really in the spirit of msd_radix_sort.
num_negatives = 0
# Process into positive buckets and negative buckets first.
if min(nums) < 0:
neg_bucket, pos_bucket = [], []
for num in nums:
neg_bucket.append(num) if num < 0 else pos_bucket.append(num)
num_negatives = len(neg_bucket)
for i, num in
nums[i] = num
q.append((0, num_negatives, num_digits-1))
q.append((num_negatives, len(nums), num_digits - 1))
else:
q.append((0, len(nums), num_digits-1))
# Main action of radix sort, similar to simpler example above
while q:
start, end, which_digit = q.popleft()
if which_digit < 0 or end - start <= 1:
continue
buckets = [[] for i in range(radix)]
for num in nums[start:end]:
which_bucket = get_digit_at(num, which_digit)

next_round_start = start
for bucket in buckets:
next_round_end = next_round_start + len(bucket)
q.append((next_round_start, next_round_end, which_digit-1))
next_round_start = next_round_end
for i, num in
nums[start + i] = num
# reverse negative portion if more than 1 negative
if num_negatives > 1:
for i, num in
nums[i] = num
return nums



def lsd_radix_sort(nums, radix: int = 10):
if not nums or len(nums) == 1:
return nums
num_digits = get_num_digits(nums, radix)
for i in range(num_digits):
buckets = [[] for i in range(radix)]
for num in nums:
which_bucket = get_digit_at(num, i, radix)

nums = chain.from_iterable(buckets)
# Handle negatives
pos_bucket, neg_bucket = [], []
for num in nums:
if num < 0:
neg_bucket.append(num)
else:
pos_bucket.append(num)
# reverse negative bucket, append positive bucket, and return
return neg_bucket[::-1]+pos_bucket


# digits 'numbered' starting with 0 on the right/ones place
def get_digit_at(num: int, which_digit: int, radix: int = 10) -> int:
# Next line is necessary due to Python's idiosyncracies with modulus
# and integer division on negative integers.
# You can skip the next line if you're using a different langauge
num = num if num >= 0 else -num
return num//radix**which_digit % radix


def get_num_digits(nums, radix=10):
# the + 1 and - 1 handle cases like a max of 1000,
# which gives 3 for number of digits when it should have 4.
max_num, min_num = max(nums) + 1, min(nums) - 1
if min_num < 0:
min_num = -min_num
return (ceil(log(max(max_num, min_num), radix)))


if __name__ == "__main__":
simple = [128, 371, 941, 212, 419, 882, 141, 194, 832, 753, 555]
negatives_1 = [3, 128, 100, 742, -233, 178, 22, -777, -321, -4, -7, -12, -120, -872, -412]
presorted = [123, 333, 444, 555, 666, 777, 888, 999]
reversed_sorted = list(reversed(presorted))
c_1 = []
c_2 = [-1]
c_3 = [-100]
c_4 = [-323, -324]
c_5 = [2]
c_6 = [782]
c_7 = [731, 2]
c_8 = [889, 890]
c_9 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
c_10 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]
c_11 = [2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
c_12 = [999, 1000]
c_13 = [1000, 999]
c_14 = [128, 371, 941, 212, 419, 882, 0, -1, 23123123242, 141, 194, 832, 753, 555]
c_15 = [128, 371, 941, 212, 419, 882, 0, -1, -23123123242, 141, 194, 832, 753, 555]
c_16 = [-324, -123, -114, -41824, -1232, -123, -1, -3, 32, -81923]
test_cases = [simple, negatives_1, presorted, reversed_sorted, c_1, c_2, c_3, c_4, c_5, c_6, c_7, c_8, c_9, c_10, c_11, c_12, c_13, c_14, c_15, c_16]
functions = [lsd_radix_sort, msd_radix_sort_simple, msd_radix_sort_negatives]
for case, function in product(test_cases, functions):
result = function(case)
comparison = sorted(deepcopy(case))
if result != comparison:
print(f"{function.__name__}")
print(result)
print("Test cases finished")

CodeSlate
Автор

MSD radix sort requires knowing the maximum length before hand, making it either need comparisons to find the maximum length, or will only work in static cases, like IDs or GUIDs. In professional development we chuck an index on the column and let the database keep a sorted list so we never need to sort on the fly for large enough volumes to make this optimisation worth using over quicksort. Radix sort is cool in theory though

akison
Автор

So that mean miracle algorithm for sorting

Anoobinvalorant
Автор

Nice video! I was able to understand everything easily without any prior algorithms knowledge

middleschoolbathroom
Автор

It's nice to know that you shouldn't just default to quick sort because it's quick. I've never heard of this algorithm but it makes sence why sometimes it would be a better option.
I really appreciate your effort to provide resources for a deeper understanding and keep the video simple

Lukasek_Grubasek
Автор

The alphabet is kinda already in number order.
That’s like comparing the 100’s together

coltenhunter
Автор

MSD Radix Sort is not inherently stable. You have to MAKE IT STABLE by implementing an extra buffer with the same size as the original list, which is extremely costly for large files. LSD Radix Sort is stable right out of the gate, no ifs or buts.

MinecraftMasterNo