Caching Pitfalls Every Developer Should Know

preview_player
Показать описание

Animation tools: Adobe Illustrator and After Effects.

Checkout our bestselling System Design Interview books:

ABOUT US:
Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.
Рекомендации по теме
Комментарии
Автор

Cache invalidation is the elephant in the room

IceQub
Автор

1. Cache Stampede🐘: single data not found in cache
a. lock when requesting data directly from DB after request missed in cache
b. external processor: either proactively or reactively updates the expiring/ expired data in cache
c. probabilistic early expiration: each request might trigger early refresh of data in cache
2. Cache Penetration🏹: data does not exist in either cache or DB
a. placeholder for non-existing data
b. bloom filter (can only tell you the data does not exist, cannot tell you the data exists
3. Cache Avalanche🏔: large amount of data cannot be found in cache
a. circuit breaker on both cache and DB
b. cache cluster instead of cache so when one part is down, the other parts still remain online
c. cache pre-warming: when starting an app, fill in cache before starting service

yuxueyuan
Автор

All I know is there are 2 hard things about programming:
- naming things
- cache invalidation
- off-by-one errors

dave
Автор

I really appreciate your video. They are so high quality with the best explanation.
Can you please make a video of what is the best strategy for using a relational database like Microsoft SQL or what else together with elastic search.
How to keep them synchronized? Thank you in advance

Kingside
Автор

Caches work well in tandem with claims, that is some data should never be cached, but rather claimed by a user during the editing process, with ability for another user to revoke the claim. Saves always check claim status first.
Multi level cache invalidating is important. For example, products returned for a catalog can use a product cache with a polled cache invalidating process. Yet, for display on a product detail page a quick check of a timestamp is done which is reset if any of the various product tables are updated by a trigger. Thus a quick scaler db query ensures the cache is valid, which also helps protect the cache itself from becoming stale.
This can be taken even further with a dedicated timestamp table for complex product information. This approach dramatically improves performance, always guaranteeing valid data on a detail page, while keeping catalogs valid within the timespan of the cache manager for polling a cache invalidation table for changes.

michaelkhalsa
Автор

Stampede Vs Avalanche: Awesome explanation. Kindly use real time example to correlate with caching issues. That will help tremendously.

gorakh
Автор

The most important knowledge about caching is: don't cache unless it's absolutely necessary.
Also, database in conjunction with read-replicas can be much more resilient and performant than your homebrew crappy caching mechanism.
Caching is final resort, after you've tried everything you can do with database, for example, tuning queries, adding indexes etc.

ANONAAAAAAAAA
Автор

Use jitter to the TTL to reduce cache avalanche and many related issues

rogers.
Автор

I'm surprised that there is no mention of a easy solution (albeit there might still be an issue when starting from a cold cache) for a Avalanche/Stampede: Just use different caching times. That should somewhat alleviate when the database is hit with multiple requests. But in essence, only cache what is necessary.

BigHalfSteps
Автор

Thanks for the insights on cache management. Could you plz suggest keys in cache need encryption? What should be the key flush time.

mailbrn
Автор

What if we add some kind of jitter for the key TTL, so we minimize the probability of having them expired at the same time?

vladyslavlen
Автор

at 0:27 on the left diagram, shouldn't the process order be :
1. Data request
2. Data response (no cached data)
3. Read original
4. Copy cache

simonbernard
Автор

Would love to see you tackle cache consistency too: what happens when the database write succeeds but the cache write fails? Or if the database is written concurrently to 2 different values but the last write to the database was value a, while the last write to the cache was value b? Now the cache is forever inconsistent.

toxicitysocks
Автор

At 3:58 the "find key" arrow has a typo and it should be (3) instead of (4)

devid
Автор

How can there be less than 1m subscribers to your channel? You have the best explanations

micahpezdirtz
Автор

to use a bloomfilter it sounds like easy, but it can't delete element yeah? will it need to refresh the bloomfilter after something to ignore deleted itens?

eduardokuroda
Автор

2:33 a hidden smiling gem at the bottom right

NemiroIlia
Автор

If something creates traffic, have a traffic signal, ok.. a lock here... regulate..
If something creates traffic, have multiple systems (web server/cache server etc) to handle it..
If something can fail, have redundant backups of THAT..

this applies to anything..

also.. to know if cache could fail in DB caz relevant answer not in DB, note down that before in some way..

parthi
Автор

Can't we use request collapsing to prevent stampede? As it maily due to expired cache entry and multiple requests are trying to access the same resource?

PranaySoniKumar
Автор

Question:

What's the point of a cache server? Why the server itself / webserver is not doing the caching?

If caching is supposed to be for fast retrieval, if we store it in a different server, won't the network call take more time than querying a db in the first place?

kazama