System Design Google Autocomplete | Typeahead Suggestion | HLD Auto Suggestion | TRIE Data Structure

preview_player
Показать описание
System Design | #SystemDesign :
Google autocomplete Systems Design is for architecting a system which gives auto suggestion on typing, also known as typeahead suggestions.
Architecture for this involves working with trie data structures, a file system or database to persist the trie and a distributed cache like redis to give fast response.
The Tech Granth has come up with their own design, this is in no way how google has implemented their autocomplete feature.

Рекомендации по теме
Комментарии
Автор

Explained in very detail, covers every aspect.

thealgomasters
Автор

High-Level System Design is NOT about telling what you know!! It is about problem-solving in a Distributed architecture.
If you say "I will use HDFS", then many people may not understand what is the motive. So I would suggest you please start with a simple solution first. Then tell the drawback of that solution and mention several ways to solve it. Then choose any one solution (with the reason of the choice). In this way, the viewer will evolve with the problem.

abhishekpal
Автор

Missing pieces:
Data storage estimates
Traffic estimate
Cache won't talk to zookeeper ( looks wrong to me) -> also no sql does it automatically splits the data on different nodes.
Spark streaming hdfs are some words should be used only when needed. Just use a queue instead of technology buzzwords.

What are the bottelecks in the system?
What if same phrase is getting searched again an dagain- you will run ito hot cache issues.
Can it be extensible to support more ranking usecases.
How are you going to actually store the Trie in database - what is the actual schema?
overall 5/10

ameyjain
Автор

You suggested we can use either SQL or NOSQL for storing TRIE, is there any preference for either ( in industry) and why ?

Dhindsa
Автор

how come the trie is O(L*N), you have 26 possible characters in a word, and if the search query has 5 words with 6 chars each, it would be 30 characters (not including the spaces). Now with that, the first level in tried would contains 26 nodes, and 2nd level would also contains 26* 26 and so on, this would lead to O(n^m) where n is number of characters we are supporting and m is going to be the number of levels and with my example, 26^30 is going to be huge to maintain.

okeyD
Автор

Thanks for simplifying the design for us. I want to know
1. when the request to be served is not present in the cache who is responsible for routing that request to ZK? From your diagram, it appears that a distributed cache like Redis/Memcache can do that automatic routing? Isn't that typeahead service code/functionality routing the request to zk?
2. When the request is served by the Trie, it also gets persisted in the cache. Who responsible for storing it in the cache? Is it the type ahead service which stores in the cache?

brvamshi
Автор

Thanks for the video. I have a question. In your example lets suppose the word "yarn" is only been searched once and there is no other word seached/updated for ya. In this case, yarn will still be stored in the cache or you kept a threshold for saving it in cache( for example storing the element in the cache only when it has been searched for more than 1000 time)
Also, where is the actual frequency of the words is stored ? Only that way you can find out if a new word becomes popular.

rishikhurana
Автор

How do you store a trie data structure in the database?

rajeshbhagat
Автор

Throw all these buzz words in an interview and you are setting yourself up for a failure unless you really know everything in details

KrishnaSharma-vlre