How Reddit designed their metadata store to serve 100k req/sec at p99 of 17ms

Показать описание

Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.

# Recommended videos and playlists

If you liked this video, you will find the following videos and playlists helpful

# Things you will find amusing

# Other socials

I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.

Thank you for watching and supporting! it means a ton.

I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.

Рекомендации по теме

Комментарии

Successfully ruined my upcoming weekend. Have to view all of your videos now 😢

nextgodlevel

The Kafka CDC can solve the problem of synchronous write inconsistencies, but not the backfill overwrriting. I suspect they might do some kind of business logic or SHA/checksum validation to ensure that they are not overwriting the data during backfilling. Correct me if I'm missing something bro.

richi

Large Data Migration -> Event Driven Architecture

Also, interesting to learn about postgres's extentions which are not required if going with a serverless database solution like DynamoDB.

AayushThokchom

Отлично, что на YouTube есть такие полезные видео. Спасибо, Министр!

GSTGST-dwrf

Since they are storing data in JSON and also scaling the postgres db, Why they did not go with non -relational db like mongoDB, which stores data in JSON and also provide scaling out of the box ?

kunalyadav

Why reddit don't go for document db for there storage as per structure and pattern .... What u think about it @arpit?

keshavb

Arpit - using cdc and kafka.. that still does not solve the problem of - Data from old source during 'migration' overriding data in the new aurora postgres, right?
What am i missing?
You will still need a bulk batch job that takes up all the archival data from all the multiple sources and ingests them into the new Aurora. Using CDC does not solve for that backport, correct?

pixiedust

why are they using postgres, if they are storing it as json ?

suhanijain

I guess we don't need both the CDC Setup and dual writes, just thr CDC setup would suffice to insert the data in the new DB, correct?

JardaniJovonovich

Hi Arpit, I think you could have gone a bit more into depth, like they have mentioned in their blog. a bit about how they are using incrementing post_id, which allows them to manage most of the query from 1 partition only. Not complaining at all. Thanks for being awesome as always.

TLDR; 7 minutes seem a bit short

sachinmalik

How pg bounce minimizes the cost of creating a new process for each request?
May be I am wrong, can you tell me how cost is reducing here?.

nextgodlevel

What is CDC mentioned here ? Please suggest some pointers

atanusikder

Good video. Would appreciate a lot it you can attach any resources you used in video like blog from reddit that is mentioned in description. Would be great if link is also attached there.

poojanpatel

Hey Arpit, thanks a lot for putting this up. Your writing skills are next level, crisp and crystal clear. Could you please tell what's the setup you use for taking these notes?
Thanks in advance.

vinayak_

How did they check if the reads from old vs new database are same?

TechCornerWithAjay

Hey Arpit… thanks for the video

I liked doing partition as policy that runs on a cron. But wouldn’t moving data around in partitions also warrant a change in backend(read) ?

Or you are saying the backend has been written in a way that it takes partitioning into account while reading the data?

dreamerchawla

how many shards were used to hold those partitions to achieve that much throughput

ganeshgottipati

What is used over here to write down the notes?

calvindsouza

How will you handle search, because the relevant data might be several days older partitions. Even if they're using a secondary data store, the date/time range-based partitioning or even sharding will not suffice. what do you think?

code-master

Thanks Arpit!! Also what are your thoughts about using Pandas as a metadata DB, Dropbox had a post regarding they using Pandas wherein they explained in depth why other DBs are not better for them. (Would like to know your views too on it)

LeoLeo-nxgi

How Reddit designed their metadata store to serve 100k req/sec at p99 of 17ms

How Reddit designed their metadata store to serve 100k req/sec at p99 of 17ms

Design Reddit: System Design Mock Interview

OSINT At Home #2 - Five ways to find EXIF/metadata in a photo or video

Breaking Big Data: Evading Analysis of the Metadata of Your Life

GIS Metadata Explained 🗂️ (In 1 Minute)

How metadata affects the algorithm in your favor #shorts

This Bot SETUPS Your Discord Server FOR YOU 🤯 | #shorts

The Metadata display in Premiere Pro - #Shorts

20 System Design Concepts Explained in 10 Minutes

Event Viewer - What is going on with Windows?

I Made $120,598 Re-Uploading Videos on YouTube... Here's How

Seeing Metadata on an iPhone

How The Internet Hunted Down This 'Anonymous' Employee, Using His Own Photo

UTM Parameters and Tracking Explained in 100 Seconds

How to prevent recovering metadata when using ExifTool? #shorts

The Importance of Metadata

It's Not Just Metadata

Python WEB SCRAPING in 30 Seconds! 🔥👨‍💻 #shorts

Correctly Creating NFT Metadata - NFT Studio

Can’t stop, won’t stop. Gorilla Tag is life.

VLC is showing a different title than is in the mp3 metadata #shorts

Design Twitter - System Design Interview

You can literally learn Markdown in 60 seconds

How to Get Token Metadata By Symbol Using Next.js & Node.js | Moralis Web3 Documentation