System Design - Building Distributed Cache like Redis, Memcached | System Design Interview Question

preview_player
Показать описание
Caching is one of the most important component of any distributed system you build. There will be only a handful of systems which do not need cache for their working. Otherwise, almost every system needs to put in cache at some point of time. If not in the beginning because of less scale, then definitely later in the stage when they scale.

In this video, I am going to talk about how to build a distributed cache. A distributed cache is a kind of cache which is spread across multiple nodes. It scales horizontally so that it can handle high load.

We are going to fulfil 4 functional requirements in our cache:
1. Putting a key-value pair in the cache.
2. Getting a value for a given key.
3. Cache eviction - What happens when cache becomes full and you want to store new value?
4. Key expiry (also called as TTL - Time to live) - Key is valid only for some time and after that it expires.

Other non-functional requirements which are also targeted here:
1. Availability: Cache should be available for handling the get and put requests.
2. Scalability: Cache should be able to higher load.

Link of some other videos I referenced in this video:

You can follow me on:

#systemdesign #system #distributed #distributedsystem #architecture #software #cache #partitioning #distributedcache #redis #memcached #caching #invalidation #writethrough #readthrough #design #programming #developer #sde #lru
Рекомендации по теме
Комментарии
Автор

Hey Udit,

I have been following your System Design videos for quite sometime and they are marvelous and having so much of details.

Just one request if you can make a video on Job Scheduler it will be great 😁

abhinavrastogi
Автор

Very helpful video, kudos to your efforts

ArpitRastogi
Автор

Hi Udit, nice and informative video. When a new box is added to the consistent hashing ring, how are the get/put/update queries handled? Suppose key K1 was earlier being mapped (hashed) to box B1, and its value V1 and timestamp T1 was stored therein. You now add another box B0 (for scaling purposes) and the hash function now maps K1 to box B0 on the ring. How do you guarantee consistency in this scenario?

mananshah
Автор

Hey udit, was wondering since the hash function does mod with the bucket size and when there is collision we store the data as linkedlist or bst, in what scenario would eviction needs to be used? only time map will run out of memory is when internal memory will will run out due to lot of items in the bucket

amitgude
Автор

Sir I had a doubt, as you mentioned that we don't need to remove the key as soon as its expired, we might remove it when there is a get() call on it by comparing current timestamp with that key's expiry time. Completely fine, but consider a situation where there was no get() call on that key, eventually the map gets filled, and when the next key arrives, we need to remove a key using LRU strategy. Now, we might end up removing some other key which is not yet expired but is least recently used, while the expired key is still in our map. What's your take on this?

anshumandas
Автор

Hi Udit, even if you take combination of cron job and eviction on GET, then that will solve the problem of output on get even after ttl but how is it solving the problem for PUT, let's say at any time between your two consecutive cron jobs, some keys got expired but we haven't evicted them(as no cron job and no get on them) but if I try to do PUT it might see cache is full and evict according to LRU where it should already have some space due to ttl eviction which didn't happen in this case.
PS: great video (_/\_)

piyushverma
Автор

There is infinite memory in distributed cache

chessmaster
Автор

I guess there is no replication of data ?

sugyansahu
visit shbcf.ru