Design Scalable News Feed System Similar to Instagram, Facebook & Twitter | System Design

preview_player
Показать описание
Design a scalable news feed system similar to feeds on Instagram, Facebook, and Twitter! We start with a simple working version and then build up to an optimized decoupled architecture while talking about the different tradeoffs that we are making.

00:00 Final Architecture Teaser
01:27 High-Level Requirements
02:20 Data Models
04:55 Creating A Post
06:50 Kafka CDC + Stream Processors
09:25 CDC Streams + Kafka
12:25 Getting User’s Feed
15:30 Problems with Computing Feed at Every Request
17:00 Pre-computing Feed with Redis Cache
20:45 Populating Feed Cache in Realtime
24:25 Populating Feed Cache Offline
25:55 Final Architecture Summary
29:52 Outro & Future Videos

#systemDesign #softwareArchitecture #interview

Рекомендации по теме
Комментарии
Автор

One thing missing in the design is about what happens for influencers and celebrities where the Push mode would not make sense.

amertia
Автор

You made me to start thinking on a lot of things in my project. Thank you very much!
A question to Irtiza or anyone:
Step 1) So I fill the Feed cache with new post ids that belongs to a user, that should be displayed for user.
Step 2) Probably I should remove the cached posts at a time... But when? When the user saw the post? Or should there be an expiration on each cached post?

koviroli
Автор

why do we need scheduled job to update cache for every user?

melk
Автор

Thanks for the amazing content! Rather than using a CDC, can we simply write a "post_created" event directly to Kafka from the post service? So the post service does 2 jobs. One, write to the database and two, write an event to Kafka.

dhiliph
Автор

3:05 Post_User does not need an ID, the primary key is <UserID, PostId> as a weak entity

mickeyp
Автор

I have a question. Let's say if A and B are friends. When A creates a post, it writes to the redis cache on server1 to build the feed for friend B.
However, friend B gets routed to server2, which means it won't have access to this cache.
In other words, if A has 100 friends, and when A creates a post, how do we update the feed cache for these 100 friends? They are in different servers and their cache will not be in server1.

JH-zden
Автор

For the Scheduled Job, you said that you will iterate through all the users in your database and update the Feed Cache. For the Scheduled Job, if it updates the Feed for every single user in our system (let's say 5M), would you be adding 5M rows to the Feed Cache?

My thought was that the Feed Cache would only store a percentage (lets say 20%) of daily users.

juheelee
Автор

there is potential bottleneck on the post api to user-post table before the cdc Kafka. Maybe can partitioning or sharding this part

jingjingcoming
Автор

I don't think storing age just as interger will make sense rather storing dob and parsing that to obtain age at run time is the approach

meditationdanny
Автор

great design and clearly articulate! thanks a lot! i just wonder, why does stream processor needs to talk to feedservice? i thought feed service now just read results from redis cache. could you help clarify?

LuluHou
Автор

Missing one context, Why feed stream processor interacts with feed service. You were saying "The feed of users". May I know what it is?

nagarajutammineni
Автор

I have one doubt in designing data model. What would happen if We do create separate table POST_USER and include User_id in Post table.

kishanprajapati
Автор

For pagination !

Lets assume you have 100 posts cached for each user.
Would you consider another service to add more posts to this user cache on reaching last available posts ?

JoaoKunha
Автор

What happens for the posted data when it fails moderation but is still being implemented processes by other workers / has been written into storage

firezdog
Автор

what if the redis cache does not have the user id for whom the feed is getting loaded, then the feed service needs to talk to post service? Or will you return no feed for them, which is a poor experience?

nochecku
Автор

1.) Why does feedStreamProcessor need to talk to Post service?
2.) How does Feed Service fetch the information of a user whose entry isn't present in cache at all? It should be talking to Friend service, Ranking service and then fetch the relevant details and then push it to cache and return the response, right?

JardaniJovonovich
Автор

Why do you need separate ID column for Friends and PostUser table when you can just use composite key (postID, userID - PostUser) which uniquely determines a row?

석상주
Автор

Awesome videos.
What is the name of the tool that you used for the diagrams?

narisetiuday
Автор

if the posts get stored in the CDC before it hits the Modification Stream Processor, then hits the Feed Stream Processor, how is it going to prevent offending messages from being posted?

dbo
Автор

Hi, great content! Why do we need a Post_User table? We could have a UserId column in the Post table that would record an owner's ID ?

sergiim