Twitter Search/ElasticSearch Design Deep Dive with Google SWE! | Systems Design Interview Question 8

preview_player
Показать описание
As coders we rarely need to search for anything because we never lose it, namely our virginities

00:00 Introduction
01:56 Functional Requirements
02:52 Capacity Estimates
05:08 API Design
05:59 Architectural Overview
Рекомендации по теме
Комментарии
Автор

Amazing content as always dude. Love how much in depth you go in all of your videos! My favorite channel of all by far! Have recommended this to several friends.

Snehilw
Автор

Sshhh Man's been hiding the gun show this whole time. Giga Chad on the low

Ms
Автор

Hey Jordan, Question here:

1. Can we have hierarchical shards. Can we have a sharding strategy to segregate nodes by search terms. i.e. aa -> going to shard1, ab -> shard2, abc to abz -> shard3, ac -> shard4 and so on.
2. With this strategy we will be writing document ids for terms to particular shards.
3. In case of any hotspotting we can further split it.
4. In case of extremely popular terms like trump, we can have it span over multiple shards.
5. We can keep track of the shards using gossip protocol or zookeeper.

From what I understood, in our current solution, we are doing a scatter gather where we are querying each partition and aggregating it. If we have hundreds of partitions, then we will have to filter out a lot of results. I understand that we can do it in parallel, but if we are only interested in top 2 results, then we will end up fetching top 2 results locally in every partition and then filter it out.

AAASHIVAM
Автор

dude u along with neetcode are my goto. Great content, and clear explanations.

cc-tojn
Автор

16:00 exactly, I was thinking the same thing. Typically write to the source of truth, and use a queue to send it out to the various locations

RandomShowerThoughts
Автор

as allways great videos - 18:00 forgot all that stuff about ESs caching so thanks for the reminder, gonna reread that part in ES docs. great job knowing about lucene most of my applicants have no clue about ES definitely not that lucene is not a db but a search engine (hate the json syntax, but what can you do) again, super fun to listen to your vids and watch this content

mickeyp
Автор

Will Search Service pull the actual documents from the DB once it receives the documents Ids from cache/search index?

yashagarwal
Автор

16:00 we can also use debezium (for certain databases) and that would write to kafka and listen on that topic

RandomShowerThoughts
Автор

great depth of core search design discuss with bit comedy ;)

SwapnilSuhane
Автор

If we use the local index (meaning each node stores term -> [doc id's] and multiple nodes can reference the same term), does this mean we need to query all the nodes to answer a search query? How do we know which nodes have the term we are interested in if we are not partitioning by term?

kamalsmusic
Автор

Wrote a long comment about how a posting list (documents containing a term) is implemented as a skip-list + encoding as per apache/lucene github repo Lucene99PostingsFormat. As I was wondering why we can't use similar idea for follower/following list storage in news feed problem (from System Design 2). But it's only viable if you either store the data in Lucene (I guess no one does that with this purpose in mind) or if you have a full control over DB code, so that you can do such advanced customization over a column (also not practical).

nice guns

maxmanzhos
Автор

Don't we need a parser/lexer service between kafka and search index that parses the tweets, hashes it to the correct partitions of the search index ?

neethijoe
Автор

Is it recommended to partition the search index by term or by the tweet_id/user-id?

lv
Автор

Love your content ! Keep up the good work !!

anupamdey
Автор

Did you really just "NOPQRS"ed to figure out what comes after P?

FarhanKhan-wufq
Автор

i dont understand most of the things. but thanks for the video.

DileepBC-rx
Автор

Hey man, qq. I was wondering if you thought it would be important in an interview to mention how we know which machine holds which partition? I was thinking maybe we could have a distributed search/index service that maintains the mappings between the partition -> machine. And that mapping could be made consistent across the “search/index service” nodes via a consensus algo or maybe zk. Does this make sense at all or am I missing something? Maybe it’s the local secondary indexes that take care of the problem I’m describing and I just don’t understand 🤷‍♂️

neek
Автор

Grokking the system design sucks at this question ngl, searched for a solution right after reading it

RandomShowerThoughts
Автор

interviewee: Api design going to be pretty tiny
Interviewer: How much tiny?
Interviewee: You know....

arteigen
Автор

00:40 the day in my life as a software engineer videos are cringey af

RandomShowerThoughts