Twitter Search/ElasticSearch Design Deep Dive with Google SWE! | Systems Design Interview Question 8

Показать описание

As coders we rarely need to search for anything because we never lose it, namely our virginities

00:00 Introduction
01:56 Functional Requirements
02:52 Capacity Estimates
05:08 API Design
05:59 Architectural Overview

Jordan has no life

Рекомендации по теме

Комментарии

Amazing content as always dude. Love how much in depth you go in all of your videos! My favorite channel of all by far! Have recommended this to several friends.

Snehilw

Sshhh Man's been hiding the gun show this whole time. Giga Chad on the low

Ms

Hey Jordan, Question here:

1. Can we have hierarchical shards. Can we have a sharding strategy to segregate nodes by search terms. i.e. aa -> going to shard1, ab -> shard2, abc to abz -> shard3, ac -> shard4 and so on.
2. With this strategy we will be writing document ids for terms to particular shards.
3. In case of any hotspotting we can further split it.
4. In case of extremely popular terms like trump, we can have it span over multiple shards.
5. We can keep track of the shards using gossip protocol or zookeeper.

From what I understood, in our current solution, we are doing a scatter gather where we are querying each partition and aggregating it. If we have hundreds of partitions, then we will have to filter out a lot of results. I understand that we can do it in parallel, but if we are only interested in top 2 results, then we will end up fetching top 2 results locally in every partition and then filter it out.

AAASHIVAM

dude u along with neetcode are my goto. Great content, and clear explanations.

cc-tojn

16:00 exactly, I was thinking the same thing. Typically write to the source of truth, and use a queue to send it out to the various locations

RandomShowerThoughts

as allways great videos - 18:00 forgot all that stuff about ESs caching so thanks for the reminder, gonna reread that part in ES docs. great job knowing about lucene most of my applicants have no clue about ES definitely not that lucene is not a db but a search engine (hate the json syntax, but what can you do) again, super fun to listen to your vids and watch this content

mickeyp

Will Search Service pull the actual documents from the DB once it receives the documents Ids from cache/search index?

yashagarwal

16:00 we can also use debezium (for certain databases) and that would write to kafka and listen on that topic

RandomShowerThoughts

great depth of core search design discuss with bit comedy ;)

SwapnilSuhane

If we use the local index (meaning each node stores term -> [doc id's] and multiple nodes can reference the same term), does this mean we need to query all the nodes to answer a search query? How do we know which nodes have the term we are interested in if we are not partitioning by term?

kamalsmusic

Wrote a long comment about how a posting list (documents containing a term) is implemented as a skip-list + encoding as per apache/lucene github repo Lucene99PostingsFormat. As I was wondering why we can't use similar idea for follower/following list storage in news feed problem (from System Design 2). But it's only viable if you either store the data in Lucene (I guess no one does that with this purpose in mind) or if you have a full control over DB code, so that you can do such advanced customization over a column (also not practical).

nice guns

maxmanzhos

Don't we need a parser/lexer service between kafka and search index that parses the tweets, hashes it to the correct partitions of the search index ?

neethijoe

Is it recommended to partition the search index by term or by the tweet_id/user-id?

lv

Love your content ! Keep up the good work !!

anupamdey

Did you really just "NOPQRS"ed to figure out what comes after P?

FarhanKhan-wufq

i dont understand most of the things. but thanks for the video.

DileepBC-rx

Hey man, qq. I was wondering if you thought it would be important in an interview to mention how we know which machine holds which partition? I was thinking maybe we could have a distributed search/index service that maintains the mappings between the partition -> machine. And that mapping could be made consistent across the “search/index service” nodes via a consensus algo or maybe zk. Does this make sense at all or am I missing something? Maybe it’s the local secondary indexes that take care of the problem I’m describing and I just don’t understand 🤷‍♂️

neek

Grokking the system design sucks at this question ngl, searched for a solution right after reading it

RandomShowerThoughts

interviewee: Api design going to be pretty tiny
Interviewer: How much tiny?
Interviewee: You know....

arteigen

00:40 the day in my life as a software engineer videos are cringey af

RandomShowerThoughts

Twitter Search/ElasticSearch Design Deep Dive with Google SWE! | Systems Design Interview Question 8

Twitter Search/ElasticSearch Design Deep Dive with Google SWE! | Systems Design Interview Question 8

System Design Interview Walkthrough: Design Twitter

Elasticsearch Deep Dive w/ a Ex-Meta Senior Manager

How @twitter keeps its Search systems up and stable at scale

TWITTER SEARCH SYSTEM DESIGN | HOW TWITTER SEARCH WORKS ? | B+ TREES | DATABASE DESIGN

Elasticsearch Architecture and Design Considerations

Twitter/Instagram/Facebook Design Deep Dive with Google SWE! | Systems Design Interview Question 2

Design Twitter - System Design Interview

Google SWE teaches systems design | EP27: Search Indexes

How To Choose The Right Database?

Twitter analysis using Elastic: data visualisation with Kibana

System Design Session - Elastic Search - Apr 24th, 2021

Fusing Elasticsearch with Neural Networks to Identify Personal Data

How to Crack Any System Design Interview

Exploring data sets with Kibana by Nicolas Fränkel

Creating An Elasticsearch Cluster | Elasticsearch Deep Dive | Linux Academy

Vector databases are so hot right now. WTF are they?

How Google searches one document among Billions of documents quickly?

Spark and Elasticsearch for Big Data Analytics | Starweaver

The Twitter Trend System

Deep Dive into Amazon ElastiCache Architecture and Design Patterns (DAT307) | AWS re:Invent 2013

Elastic Search Cache 100X Faster Search Results | Test Done using Jmeter

Elasticsearch for logs and metrics: A deep dive – Velocity 2016, O’REILLY CONFERENCES

PEPR '22 - Identifying Personal Data by Fusing Elasticsearch with Neural Networks