Implementing Vertical Sharding

preview_player
Показать описание

Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.

In the video, I explained the importance of sharding in scaling databases, focusing on vertical sharding where tables are distributed across multiple servers. I discussed the transition from monolithic to microservices architecture and how vertical sharding helps in this shift. I detailed the implementation steps of moving tables between database servers, emphasizing the use of tools like Zookeeper for storing meta information and ensuring reactive updates across API servers. The process involved dumping tables, loading them into new databases, setting up replications, and performing a seamless cutover for data consistency.

# Recommended videos and playlists

If you liked this video, you will find the following videos and playlists helpful

# Things you will find amusing

# Other socials

I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.

Thank you for watching and supporting! it means a ton.

I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.
Рекомендации по теме
Комментарии
Автор

I have hit my head so many times understanding the sharding, but could not get it so well. But this guy never fail to explain the things in so so easy way. I am lucky that I found this channel on YT.

ddevotion
Автор

binge watching after a break ! So much quality content over the last week. Thanks a lot

adianimesh
Автор

Maybe a dumb question but here I go

If the table renaming step ( table to table.bak) is done after firing zookeeper update, can’t this be more helpful in reducing the small database down time ( assuming zookeeper updates happen immediately without consistency issues )

Saying this because, if we do this, the second DB server is anyways up and will take requests and renaming can happen later … this will also ensures that the replication is completely done

vasusharma
Автор

There are actually 2 ways to do Vertical Sharding:
1) The one which you mentioned where there is no downtime, it's real time
2) The one where there is downtime and T2 is dumped on DB2 during downtime so as to avoid replication and further steps

You mentioned in video at last that large tables are not sharded as the replication process for them to get in sync is difficult, for such tables we can use downtime approach. What if you come up to a situation where DB1 shard has 2/3 almost same size tables and all are hot? You can take a downtime and do vertical Sharding.

Also for financial domain companies, this realtime vertical sharding with 10-15ms is not acceptable, they also might be using downtime approach!

jaisamtani
Автор

how is vertical sharding different from partitioning?

cnp
Автор

Really grateful to God that I found this valuable gem like content on time♥️
Never worry about views, because some gem music videos are also hidden and craps are getting 1B views

homestaysandcafes
Автор

Arpit bhai you just rocked it, ek dum top level content with no clutter even for 1 second.
This video filled me up with confidence that I can handle database in production.
Started binge watching your channel.
Keep making content Sir.
One thought that hit me while watching this video : This type of content will catalyse the transition of India from being IT services hub to IT manufacturing hub😄

mukeshmahadev
Автор

Thanks for an amazing video - may i know the difference between federation and vertical sharding?

ramyakrishnan
Автор

Designing cannot be anymore intersting and easy when compared to your videos. Just keep those videos comming everyday !! I've already took out a slot from my day to watch your videos.

Aditya-usgj
Автор

Great content Arpit. Very easy to understand,
small doubt - Why to rename the table first then go for zookeeper config change? Why not first update the config in zookeeper then take back up the table. This will reduce the availability loss.

ashishtewari
Автор

One doubt. While copying bin log to shard 2 we record last time stamp or id till which we copied right? After that copy is done, we start replication right?

CDC or replication service catches up the shard 2 with live updates

What about the updates that are happening to db while we are copying the bin log?

arunrahullakkapragada
Автор

One of the best content on YT for Vertical Sharding, looking forward for Horizontal Sharding video. :). One question related to joining between 2 database tables, do we need to use host level join in case?

vikas
Автор

Arpit, as always amazing video, thanks for sharing. I was actually planning to join your Feb cohort but couldn't enroll before the registrations got closed.

I have some thoughts on this video, maybe I am missing something. I feel migration of table t1 from 1 db server to another with this approach is kind of over-engineering. I have done migration in my previous company, let me explain my approach.

1. We don't need to store the metadata about which db server the table belongs to in zookeeper or any service discovery. Generally in each app server we have our DB configurations file (yaml, xml), we can add and maintain both the DB configs in that. And app server connects to both.
2. The cutover can happen gradually with dual writes to the table in both the DB servers (simple code change). And historic data can be migrated by the db table snapshot.
3. The final cutover can be done by maintaining a config in a remote config, which is basically WIREON/WIREOFF (WOWO) configuration, i.e. turnoff the writes to the previous db server table (example: disable.writes.to.xyz := true)

Let me know your thoughts..

aniruddhkhera
Автор

Appreciate the detailed breakdown! 🧐 I wanted to ask something unrelated: 🤔 I have these words 🤨. (behave today finger ski upon boy assault summer exhaust beauty stereo over). How do I use this? 🤨

GieGarczynski
Автор

Thanks Arpit, Allah bless you. Top notch level content.
Have one query :
If I have a large DB/Table for which I have indexing on some columns as well. While partitioning my data, do my indexing also got partition or I have to do manual indexing on my data partition on it's restored on different DB instance.

Polly
Автор

Great content. Didn't feel like skipping even for a sec. Kudos!! Also, one question - How do we go about migrating huge table from one database server to another? By huge table I am assuming that the table size is big and also there are huge number of writes happening.

ujjwalsaini
Автор

Just connecting the dots, is this the same way how we scale blob storage (S3/Azure) when data node in a bucket gets hot? :)

shivamsrivastava
Автор

What about the failed requests when we renamed the table ie `Table Not Found` part. Read requests are acceptable but those write requests will be lost, wont be that a consistency issue?

chiragrajani
Автор

Hey Arpit, Great Video . Have a doubt here
You mentioned zookeeper watch will inform the API server when there is a change, but where will the API Server store this config information ?
If API server is not storing the config information then with every request we need to hit the zookeeper first to get the config, which will surely add latency.

sayantankundu
Автор

How the watch updates config of API server? Are they connected on websocket? Dont think so generally. Please explain.

rahulsarkar