What are Distributed CACHES and how do they manage DATA CONSISTENCY?

Показать описание

Caching in distributed systems is an important aspect for designing scalable systems. We first discuss what is a cache and why we use it. We then talk about what are the key features of a cache in a distributed system.

The cache management policies of LRU and Sliding Window are mentioned here. For high performance, the cache eviction policy must be chosen carefully. To keep data consistent and memory footprint low, we must choose a write-through or write-back consistency policy.

Cache management is important because of its relation to cache hit ratios and performance. We talk about various scenarios in a distributed environment.

System Design Video Course:

00:00 Who should watch this video?
00:18 What is a cache?
02:14 Why not store everything in a cache?
03:00 Cache Policies
04:49 Cache Evictions and Thrashing
05:52 Consistency Problems
06:32 Local Caches
07:49 Global Caches
08:56 Where should you place a cache?
09:35 Cache Write Policies
11:38 Hybrid Write Policy?
13:10 Thank you!

A complete course on how systems are designed. Along with video lectures, the course has architecture diagrams, capacity planning, API contracts, and evaluation tests.

You can follow me on:

References:

#SystemDesign #Caching #DistributedSystems

Рекомендации по теме

Комментарии

Gaurav nice video. One comment. Writeback cache refers to writing to cache first and then the update gets propagated to db asynchronously from cache. What you're describing as writeback is actually write-through, since in write through, order of writing (to db or cache first) doesn't matter.

VrajaJivan

Write-through: data is written in cache & DB; I/O completion is confirmed only when data is written in both places
Write-around: data is written in DB only; I/O completion is confirmed when data is written in DB
Write-back: data is written in cache first; I/O completion is confirmed when data is written in cache; data is written to DB asynchronously (background job) and does not block the request from being processed

waterislife

Other variants
1. There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.
2. There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery

GK-rldu

I can already hear the interviewer asking "with the hybrid solution: what happens when the cache node dies before it flushes to the concrete storage?" You said youd avoid using that strategy for sensitive writes but you'd still stand to lose upto the size of the buffer you defined on the cache in the e entire of failure. You'd have to factor that risk into your trade off. Great video, as always. Thank you!

mannion

Notes:

In Memory Caching

- Save memory cost - For commonly accessed data
- Avoid Re-computation - For frequent computation like finding average age
- Reduce DB Load - Hit cache before querying DB

Drawbacks of Cache

- Hardware (SSD) much more expensive than DB
- As we store more data on cache, search time increases (counter productive)

Design

- Database (Infinite information) vs Cache (Relevant information)

Cache Policy

- Least Recently Used (LRU) - Top entires are recent entries, remove least recently used entries in cache

Issue with caches

- Extra calls - When we couldn’t find entry in cache, we query from database.
- Threshing - Input and output cache without ever using results
- Consistency - When update DB, we must maintain consistency between cache and DB

Where to place the cache

- Close to server (in memory)
- Benefit - Fast
- Issue - Maintaining consistency between memory of different servers, especially for sensitive data such as password
- Close to DB (global cache, i.e. Redis)
- Benefit - Accurate, Able to scale independently

Write-through vs Write-back

- Write-through - Update cache, before updating DB
- Not possible for multiple servers
- Write-back - Update DB, before updating cache
- Issue: Performance - When we update the DB, and we keep updating the cache based on that, much of the data in the cache will be fine and invalidating them will be expensive
- Hybrid
- Any update first write to cache
- After a while, persist entries in bulk to database

mengyonglee

Dude you are the reason for my system design interest Thanks and never stop making system design videos

bhavyeshvyas

If someone explains any concept with confidence & clarity like you in the interview, he/she can rock it seriously. Heavily inspired by you & love your content of system design. Thanks for the effort @Gaurav Sen

rahuljain

I am actually using write back redis in our system but this video actually helped me to understand what's happening overall. GReat video

SatyadeepRoat

Cache doesn’t stop network calls but does stop slow costly database queries. This is still explained well and I’m being a little pedantic. Good video, great excitement and energy.

Sound_.-Safari

Nice video Gaurav, really like your way of explaining. Also, the fast forward when you write on board is great editing, keeps the viewer hooked.

neerajmathur

The world needs more people like you. Thank you!

jsf

Great content. Would love to hear more about how to solve cached data inconsistencies in distributed systems.

kabooby

always watching your videos. topic straight to the point. keep uploading man. thanks always.

jajasaria

Great video. But I wanted to point out that, I think what you are referring to as 'write-back' is termed as 'write-around', as it comes "around" to the cache after writing to the database. Both 'write-around' and 'write-through' are "eager writes" and done synchronously. In contrast, "write-back" is a "lazy write" policy done asynchronously - data is written to the cache and updated to the database in a non-blocking manner. We may choose to be even lazier and play around with the timing however and batch the writes to save network round-trips. This reduces latency, at the cost of temporary inconsistency (or permanent if the cache server crashes - to avoid which we replicate the caches)

AnonyoX

A few other reasons not to store completely everything in cache (and thereby ditching DBs altogether) are (1) durability since some caches are in-memory only; (2) range lookups, which would require searching the whole cache vs a DB which could at least leverage an index to help with a range query. Once a DB responds to a range query, of course that response could be cached.

devinsills

each of ur videos, i watched ay least twice lol, thank you!! WE ALL LOVE U! U R THE BEST!

oykfwrl

Hi Guarav, I really like your videos thank you for sharing! I need to point out something about this video. Writing directly do DB and updating cache after, is called write around not write back. The last option you have provided, writing to cache and updating DB after a while if necessary, is called write back

zehrasubas

Great explanation. You are making my revision so much easier. Thanks!!

legozxx

I watched this video 3 times because of confusion but ur pinned comment saved my mind
thank you sir

anjurawat

Hi Gaurav - good video on distributed caching! This expands a bit more on what I learned in my computer architecture class - I didn't recall thrashing the cache too well, or what distinguished write-through vs. write-back. I think learning caching in the context of networks is more interesting, since it was initially introduced as a way to avoid hitting disk ( on a single machine ), but is also a way to reduce network calls invoked from server to databases.

harisridhar

What are Distributed CACHES and how do they manage DATA CONSISTENCY?

What are Distributed CACHES and how do they manage DATA CONSISTENCY?

Cache Systems Every Developer Should Know

System Design Interview - Distributed Cache

Distributed Cache explained - Software Architecture Introduction (part 3)

19. System Design: Distributed Cache and Caching Strategies | Cache-Aside, Write-Through, Write-Back

What is Redis Cache?

Caching Architectures | Microservices Caching Patterns | System Design Primer | Tech Primers

Redis system design | Distributed cache System design

Cache | System Design Fundamentals

Distributed Cache System Design - Part I | What is Distributed Caching?

Enterprise Grade Performance with Distributed Cache

Distributed Caching

Introducing Stashed Distributed Cache

What is Distributed cache and example of Map Reduce wordcount example program Distributed cache

What is Distributed Cache Explained? Caching Architectures | Introduction to Distributed Database

System design: Cache. Where can caches be located? Local cache/ global cache and distributed cache.

Lesson 77 - Caching Topologies: Distributed Cache

Boosting your applications with distributed caches/datagrids by Katia Aresti

Redis Cache Course video #1l Distributed Cache in Asp.NETCore6.0| What is Distributed Caching?

distributed cache and transparent integration

Distributed cache in Hadoop | Explained in example with Full MapReduce code

demo distributed cache

Distributed Cache Writes: What You Have To Know | Systems Design Interview 0 to 1 With Ex-Google SWE

Distributed cache | Hadoop Interview questions