System design mock interview: 'Design WhatsApp or Telegram' (with ex-Google EM)

preview_player
Показать описание

System design mock interview: "Design a messaging app like Whatsapp or Telegram" with an ex Google Engineering Manager, Mark.


Chapters:
00:00 Intro
01:08 Question - Design Telegram
01:18 Clarifying questions (non-functional requirements)
05:03 Clarifying questions (metrics)
08:45 Clarifying questions (functional requirements)
12:30 High level design (API)
18:05 High level components
21:30 Database
30:57 Drill down (Architecture diagram)
39:40 Drill down (Message distributor)
44:17 Bottlenecks
47:40 Conclusion
50:56 Interview finishes

About us:
IGotAnOffer is the leading career coaching marketplace ambitious professionals turn to for help at high-stakes moments in their career. Get a job, negotiate your salary, get a promotion, plan your next career steps - we've got you covered whenever you need us.

Рекомендации по теме
Комментарии
Автор

I think you can tell that Mark has had a lot of experience as he's able to navigate easily even topics that he declares that he doesn't master and he's also an excellent communicator. Those are very valuable soft skills and not many people think about them. Also, I loved his enthusiasm at the end.

However, I imagine that Mark hasn't had a lot of changes to build systems in the recent years as there are some slight technical issues with what he proposed and, since this channel probably attracts many beginners, I'll go over them and try to come up with better alternatives than what Mark proposed. Hopefully it will be an useful exercise!

First, the API is a bit wrong. Normally you put the version before the resource name, so instead of `/messages/v1`, it would be `v1/messages`. However, what's more interesting here is the way we model our domain, as it's not just about users and messages and it's mainly about conversations: when you open the application you do not see messages, you see a list of conversations and the messages associated with the latest conversation, when you want to send somebody a message, it is part of a conversation and only one conversation can happen between two users, plus there are features like blocking, notifications, etc that are associated with conversations, not individual messages. So you'd work with something like `POST and then you don't need to add information about the recipient as we already know the conversation ID from the URL (and also scales nicely when thinking about groups). Also, you most likely don't want to mark as read each message individually and instead you'd probably want to mark the conversation as read until a specific message ID (imagine a chat with 500 unread messages, it would be horrible to generate 500 requests just to mark a conversation as read).

As for the database, I think Mark kind of rushed to the NoSQL solutions. You can easily scale relational databases like MySQL (Facebook, Github, Shopify are all heavy MySQL users) or PostgreSQL and there are managed offers for both databases. The real question is if you have relational data and you can take advantage of those relationships. And I think we do have relational data: a user has many conversations and a conversation belongs to two participants and a conversation has many messages.

Speaking of messages, how do we handle the sent/delivered/read status of a message? By default, a message has the status "sent", otherwise it wouldn't be in our database. Whenever the other participant's application pulls conversation data (either in the background or when the user opens the app), it sends back the ID of the last message received, so all the messages up until that point are marked as "delivered". Whenever the user opens the conversation, the app sends back the ID of the last message that appeared in the viewport, so all messages up until that point are marked as "read". This is more efficient because you can do bulk operations on messages and the logic that determines the limit of those bulk operations is offloaded to each individual device instead of a process that loops over billions of messages.

There are some finer points, like how does the application know that a new message has arrived. It could use the push notification system from Apple or Google, but there are no guaranteed deliveries. It could poll an endpoint, but that would generate massive load. It could also open a persistent connection (like Websockets) and receive messages that way, but you will need thousands of machines if you're targeting billions of people. And many others...🙂

rockatanescu
Автор

This channel is great, but from what I have seen so far it could improve a little bit by making the interviewer challenge the candidate a bit more; this would be a more real-life scenario than just "agreeing" with all the candidate is doing. i.e. Not all Mark's decisions are ultimately flawless and there are tradeoffs to consider that should be mentioned. Great work, though!

ricardobedin
Автор

I find it sad that there was no talk about websockets or any other technology that would enable a real-time chat experience. How are two users gonna chat? we designed an API to post messages and an API to get unread messages. Are we supposed to poll the server every second to get the unread messages? how is that gonna work for telegram that has 1 BIllion users and 15Billion messages daily. And what was that logic of looping through unread messages about? how it that possible with these numbers. I think the good part of this interview was the begging, since the estimations were done great though I find it disappointing that those estimations did not really play a role for the next 40 minutes of the interview. We did not use any of our estimations during the actual designing process.

jacksmith
Автор

Looping over messages in database with 10B messages a day? Seriosly?

iaroslavragel
Автор

Did not talk about how can users chat, nothing about things like websockets, polling, sse, eventual consistency etc. Should everything be tranferred with HTTP(S) or there is something more efficient? Should all the chatting go through the API servers or we can implement direct communication between the (two) users?
I would also not suggest specific technologies (like DynamoDB), but I would rather say something like "I would pick a database that has this and that properties, so that we can address this and that problem".
In general, I expected more, especially from a person that served so much time at G.

hutofrock
Автор

I would really have to say that this was a great interview! I just have one nit on the design. I really think a stream/MQ/queue approach to dealing with a message distribution would be more appropriate in this case. "Looping over the database" is a huge waste of resources/bandwidth/etc and a red flag IMO in addition to data consistency and state issues with multiple processors. I can only assume that Mark expects to query an index repeatedly which could also be expensive based on the required latency. Processing message queues scales quite easily and decreases the latency between receiving and routing the message. (ex-G 10y)

joshmartin
Автор

Mark is so amazing at going through the design in such a simplified but detailed way. I decided to use his approach in an interview and it worked really well for me thank you Mark.

mybarsnos-rcoj
Автор

the interviewee is having so much clarity in this thoughts he's amazing crisp clear i love the way how he sticks to just limited details than talking or going through unlimited details in this video, which would lead us to confusion. that greatly shows his experience too

anitha
Автор

It's a funny thing that ex-Googler tells more about AWS than Google solutions 😂

olegnikitindev
Автор

mark's accent is very native and comfortable, this is a good material for practice my spoken english

RexHuang-edhz
Автор

I know this is for people training their interview, but boy is this content gold for system architects and people that want to build stuff like apps or startups. Just found out about the channel, 10/10

emanuelturis
Автор

thank you for the content

it would be great if the interviewer can ask some questions instead of only praising the candidate:
like where do the message text sit?, any caching of messages, pull vs push stretegy of delivery of messages and their status etc

kumarc
Автор

Why do we need a message distributor? When the api server queries and creates a new message entry, it can make an extra query to update the unread_message_ids of the receiver. Doing so can help to get rip of the message distributor.

namnguyen-kckp
Автор

thta really awesome. I learnt a lot. One thing puzzled the distributor and from there many question coming.
1. should we really have a massive table on messages? if yes how sharding/hashtable should be implemented? how users and messaging will behandled if both remainsin different box?
Otherwise Little elaboration on table structure and sharding would be of great help/
2. what if considering KAFKA kind of storage system and remove all delivered, hold copies not delivered with some count and day basis. This will eliniminate storage requirement.

samirkumarpadhi
Автор

Would have been nice to compare this approach to an Append-only-log structure like Kafka as a store – I suspect that's where we would have ended up if group messages were part of the design. I think not handling groups was a miss here. I would expect a senior/staff/principle candidate be able to talk about the added scaling needs of group chats.

BenoitStPierre
Автор

Great interview!
This channel is quite underrated.
Keep up the great work!

👏👏👏

Smplebserver
Автор

A good work by IGotAnOffer, detailed explanation on system design questions. All the best!

gulati
Автор

Why did we go with HTTP REST over a web socket?

ralphez
Автор

I have a senior systems interview tonight.
I've probably watched 100 hours of breakdowns and mock interviews in the last 2 weeks.
Wish me luck.

DavidWoodMusic
Автор

His design really reflected his lack of understanding of how front end works

OneMillionDollars-tuur