Find Median from Data Stream - Heap & Priority Queue - Leetcode 295

preview_player
Показать описание


0:00 - Read the problem
1:39 - Drawing Brute Force
5:02 - Drawing Heap Solution
15:15 - Coding Solution

leetcode 295

#median #heap #python
Рекомендации по теме
Комментарии
Автор

Anyone who can come up with solutions like this, on the fly, without having first studied the problem, is a genius and I salute them. Also thank you Neetcode for this video

richardnorth
Автор

I dealt with a slightly different problem in the 1980s. I was to find the "moving median" of N points in a data sequence that keeps on coming. But I wasn't going to do this with programming, but with dedicated hardware, like gate arrays. At any instant, there were N saved points. The oldest was discarded, the sample just arrived was inserted, and the median of the N points in the list was reported. I used a min-heap and a max heap. The exciting thing was that all the comparisons and data movements in working with a heap can be done in parallel, so the O(log N) time per sample reduces to O(1).

charlesmrader
Автор

This is just incredible... for some reason I either never learned, or don't remember learning, Heaps in my computer science education at university. Throughout the years I have always heard the word Heap but never really drilled down into what it did or where it is useful. This really inspired me to fill the gap!!

MichaelButlerC
Автор

Started grinding seriously as my I have a big tech interview coming up. The more obscure problems have video solutions on youtube which are horribly explained. I really appreciate how great your solutions are now, and I already loved them before. Thank you NeetCode, hopefully everything goes well!

UnknownEntity
Автор

I'm a bit confused on why to add the new numbers directly to small heap by default. Couldn't this be optimized by comparing the incoming number to both of the root nodes and adding the new one to the small/large heap depending on if the new value is less than or greater than the root nodes?

Like let's take 3547:
First add 3 to the small heap - O(1)
Next 5 comes in, is greater than 3, add to large heap - O(1)
Next 4 comes in, is between the values, add to small heap - O (log n)
Next 7 comes in, is larger than 4 and larger than 5, add to large heap - O (log n)

if that last number was a 1, its smaller than 4 and 5, add to small heap. Now the heaps are unbalanced so pop 4 from small heap, push to large heap - O (log n) + O (log n)

This way whenever your trees are imbalanced you can just pop the root of the larger one and push it to the smaller one to keep them always balanced and at most you have to push and pop once rather than twice with the "swap" method in the video.

I may be missing something, but it seems inefficient at first glance.

Other than that, great video, you always explain these so concisely 👍

Grim_tidings
Автор

10:18
But two, that's "two" big of a difference 😁
Thank you NeetCode, your videos have helped me understand hard problems a bunch!

JonathanBatchelder
Автор

Excellent work on explaining this!

I was wondering if you can do a follow up video expounding upon the other approaches to the problem? IMHO, the primary reason this a leetcode hard problem is because of the myriad of approaches to solving it especially given different constraints (i.e. memory bound - how can do we do it with Reservoir Sampling, Segment Trees, etc.) I saw the Leetcode solution to this problem mentions these different approaches but I think you will do a much better job of explaining it :)

dylanwu
Автор

Very nice visual explanation. I would suggest, that instead of saying size is approximately equal, can we say size difference must be at MOST 1 ?

protyaybanerjee
Автор

I had this in an interview and was asked:
- "If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?"
- "If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?"

How would you go about this? :) Thanks!

bernhardhausleitner
Автор

Thanks for the explanation!
I found the below solution a bit cleaner yet still intuitive.

The idea is that for every new number,
1. we first push it to the small heap
2. then pop from small heap and push to big heap
3. check if the small heap is too small(not balanced)

the code can be reduced largely in this case:

class MedianFinder:

def __init__(self):

self.maxHeap = []
self.minHeap = []

def addNum(self, num: int) -> None:

heapq.heappush(self.maxHeap, num * -1)
heapq.heappush(self.minHeap, heapq.heappop(self.maxHeap) * -1)

if len(self.minHeap) > len(self.maxHeap):
heapq.heappush(self.maxHeap, heapq.heappop(self.minHeap) * -1)

def findMedian(self) -> float:
if len(self.maxHeap) == len(self.minHeap):
return (self.minHeap[0] + self.maxHeap[0] * -1) /2
else:
return self.maxHeap[0] * -1

kritmok
Автор

An alternative solution could be to use self-balancing trees (AVL, Red-Black, etc.) where all the operations are log_n. However, we don't need find and instead just need to maintain a separate pointer to the median, then after each insert, depending on whether the inserted number is greater or less the current median, we move the median pointer to the element adjacent to the previous median.

houmankamali
Автор

Man you're actually so goated. These explanations legitimely make me understand the structures and concepts and I can code out the solution from understanding the logic before you go through writing the code. While I'm sure Google was great, I'm glad you're focusing full time on this because you're the absolute best resource I've come across

fruitshopowner
Автор

Is this not the same as using a single heap?

kmichal
Автор

A bit shorter and easier:
import heapq

class MedianFinder:
def __init__(self):
self.min_heap = [] # To store the larger half of numbers
self.max_heap = [] # To store the smaller half of numbers

def addNum(self, num: int) -> None:
# Push the number to the max-heap (negated) to simulate a min-heap
heapq.heappush(self.max_heap, -num)

# Pop the smallest number from the max-heap and push it to the min-heap
heapq.heappush(self.min_heap,

# If the min-heap has more elements than the max-heap, balance the heaps
if len(self.min_heap) > len(self.max_heap):
heapq.heappush(self.max_heap,

def findMedian(self) -> float:
# If the total count of numbers is odd, the median is the top of the max-heap
if len(self.max_heap) > len(self.min_heap):
return -self.max_heap[0]

# If the total count of numbers is even, the median is the average of the tops of both heaps
return (-self.max_heap[0] + self.min_heap[0]) / 2

maniskarki
Автор

Why do you do this thing where you first add to the left by default, and then check & rebalance between the two heaps? Can't you peek at the min/max in either tree and decide where it should go based on that, and based on the current sizes of each tree without having to actually go through an `add` operation?

OmriSama
Автор

question: if you're doing a bunch of logn operations for every add, isn't this a nlogn algorithm? if you're adding n elements, and each time you'll do logn operations, this becomes nlogn. you may as well have just sorted it.

reinforcer
Автор

Thank you for this great explanation! I was wondering why can't we use a self-balancing BST like AVL or Red Black Tree ?

ranaafifi
Автор

Great stuff, thanks mate! I've been thinking off migrating to Python for interviews but the edge case with max heap again made me stop doing that :)

MsSkip
Автор

how do you recognize that you can use 2 heaps instead of an array + binary search

memevideos
Автор

thanks neetcode for amazing explanation
you are doing a great job are a few youtube channels which implement problem code in python...YOU ARE THE BEST OF THEM 👑

piyushupadhyay