31: Distributed Priority Queue | Systems Design Interview Questions With Ex-Google SWE

preview_player
Показать описание
Personally, I'm a distributed priority jew

Рекомендации по теме
Комментарии
Автор

2:40 Requirements (enque, modify priority, deque, at least once), ordered consumption
5:10 Implementation using heap in memory, data on disk (only in memory ? too many shuffles, only disk ? slow)
9:40 Modifying data - easy to modify data, priority is not so.
11:20 Replication - single leader, partitioning - (round robin to avoid hot shards for top range)
15:40 Consumption (client long poll from all three and maintain local heap), use poll/long poll based on time taken for consumer.
17:22 Diagram (on disk index on id, in-memory partition by score )

knightbird
Автор

Thanks for the great video Jordan!
Here are some questions to help me understand better:
1). Is the following understanding correct: When a task is consumed by a consumer, it removes the task from the heap and stores that task to some pending storage. When it gets the ACK that the task is done, it removes the task. When it expires or gets a failure message from the consumer, it repopulates the task into the queue of the same partition.
2). I think we need some not-that-frequent heartbeat between the partition server and the consumer. Say the tasks are some time-consuming batch jobs, we cannot always wait for the consumer to report the task has succeeded or not. In such scenario, a "partial failure" is a headache say the consumer is down, but the batch job task might still be running. We have to take care of the exact-once-processing aspect too. What is your favorite way to tackle that?

thunderzeus
Автор

Hi Jordan. The videos are amazing. Thank you. A suggestion - can you please post a link to youtube video with your substack post if possible? I know I can always open youtube in a new tab and search, but it will save 3 seconds every time I go through your notes, which I would rather spend trying to search for your feet pics

s
Автор

Jordan would HBase be a better alternative to MySQL here? You get write ahead log style indexing which can then be leveraged to build our heap as well and expire elements as they are consumed.

guitarMartial
Автор

You are doing great work man. A slight calculation error on 8:25 I presume, isn't (16bits * 1 billion )16 billion bits approximately 1.86 GB?

monk_learn
Автор

Hey Jordan, how will you modify the priorities of the existing elements with the way that you have partitioned the db? It seems like we will have to search through all the partitions to find the element and then update its priority!

iknowinfinity
Автор

Thanks Jordan! I have a couple questions for you.
1. If there are multiple consumers long polling the same partitions, do they need to somehow mark the next highest priority item as “pending” so the other consumer doesn’t also read it? How does long polling help here?

NBetweenStations
Автор

Hey Jordan
Your videos are great. Please keep uploading.
Just curious, what is this drawing tool you use?

unkgoku
Автор

at 9:16 now, impressed with the design. didn’t consider it. a lesson to self to explore hybrid designs

RS-
Автор

Awesome video, been binging your playlist as someone who is new to system design and loving the breadth and depth of knowledge and application of it provided here. I did have one question though and please correct me if I am wrong. I noticed in your older system design questions playlist, it seemed to be more in line with grokking, where the video is broken off into more concrete sections and having "API Design" and such but on the other hand this 2.0 questions playlist is more in depth but more freeflowing. In an actual interview would it be more in line with how your initial playlist was or this 2.0 playlist?

Also, my goal of going through this playlist is to build up my "instincts" when it comes to system design atleast to a respectable level and so in your opinion would watching your videos while taking notes be enough to do that to where if I had a new problem not covered on the channel yet or in an interview it would be enough or one would have to do practice outside of these videos?

Not sure if that was clear enough on my part but interested to hear your thoughts and once again appreciate the time you take for this, real help!!!

mayezabdul
Автор

Hey Jordan and congrats on the milestone :) where does the in memory heap lives? is it in the DB, is it possible? it's just i didn't see in the final design so i wanted to make sure and ask

lalasmith
Автор

couldnt you use redis sorted set for this for the priorities and point to data on disk or somewhere.

and then to enquee, add to the set.
to deque, write a redis key to lock the element (Redis incr for distributed lock with ttl), then process and remove the element from sorted set

harman
Автор

If you had an only once requirement, how would you prevent 2 consumers from processing the same event at the same time? The replication scheme you suggested doesn't lock rows across multiple records on read, does it?

ameddin
Автор

great video, could you please elaborate why using priority as partition key will end up locking in that one node ? thanks

shuozhang
Автор

keeping a in-memory heap for priority queue may be not durable when node make crash recovery.
what about if we make index on priority column of the table in sql and it will be stored in disk memory itself?

manojgoyal-yk
Автор

Great video, Facebook engineering blog is indeed interesting 😅

fallencheeto
Автор

Could we build everything on top of Postgres with additional index on priority? I'd expect a service then run SELECT FOR UPDATE * from tasks ORDER BY priority ASC LIMIT 1 SKIPPED LOCKED (10, 20 depending on the clients request). We can update a state of a task as RUNNING and have a separate process that makes RUNNING task available again after some timeout. It requires much more DB writes/reads than the min heap in memory but doesn't need PAXOS. Thx

ika-os
Автор

Can you create a design video on insurance marketplace system and online course platform system ( teacher can register courses, students can subscribe, how do you get users on platform etc)

devanshi
Автор

Hey, why Ex? I just started watching your earlier videos.

dirty-kebab
Автор

I believe it’s 2 GB in memory. 16 bits = 2 bytes. 2 bytes x 1 billion = 2 GB.

samiransari