Distributed Consensus - Raft Writes | Systems Design Interview 0 to 1 with Ex-Google SWE

Показать описание

Typically whenever I propose a bunch of writes (concurrently DMing 10 random instagram models), it's a rarity that I will receive a positive response from a quorum

Jordan has no life

Рекомендации по теме

Комментарии

Hey Jordan, big fan of your this series. I had a couple queries regarding the raft algo

1. As per 1:55, it is said that a follower can be more updated than the leader in the case where the write was not successful to a majority of nodes and that over updated follower was one of the minority that accepted the write. But in 7:06, it is mentioned that when a quorum of nodes is achieved, only then the leader commits the write locally and tells others to commit as well. My question is, when will the log be written in the follower nodes? will it be before the leader confirms commit or after receiving the commit signal? If the log is written only after it receives commit from leader, then in what case the follower's log can be over updated as compared to the leader because the fact that the follower has that log means that it has been committed by the leader as well. Is there an edge case where the leader crashes while sending commit signals to the rest of the followers and only some followers(not majority) receive that signal and write it to their log?

2. In 5:10, it is mentioned that if when the first time a leader sends the write to the top replica with its prefix and new write suffix, the top will reject it because the prefixes don't match. As a result, the leader will go back its log by an index and send the second last written entry as the prefix, which matches the top prefix as well and thus the write will be accepted for the next 2 suffixes, also backfilling the log. So what happens if the log in a follower node is heavily outdated? Would it require leader to go back its index one by one and make a network call each time to check if the prefixes match? Would it not be simpler and less time consuming for the follower to return its prefix, so that the leader can check till which point the follower has been updated?

3. In case a follower node is over updated, its prefix will obviously not match the leader's prefix. What will leader do then? Will it just go back its index and just try to match a position index in its log with the corresponding index in the follower log? And then when it finds equal value and equal index, will it overwrite the rest of the incorrect suffixes that it was not able to match? Just need to confirm my understanding here

Again, thanks a lot for providing such awesome content. I don't like reading books, so your vids are carrying rn ngl

introvertidiot

Hey Jordan, thank you so much for the amazing content. Had few questions (apologies if I am missing sth terribly fundemantal here :)):

1- What if a reader (let's call it node B) has a log that doesn't even exist in the leader? (As I understand from prev. video, that could be possible, right? I.e. Such a leader could have been picked if majority of the readers have logs that are only smaller than the highest term of the candidate leader.)

(I now see that you already answered this for introvertidiot1502's 3rd question: IIUC then, It's possible to have a node such as node B, and in that case, the later logs that are existent in node B are going to be overwritten by leader.

In that case, wouldn't we be losing that data possibly forever?
My below questions were only followups with assumption that we don't want to lose data.)

2- If above case is possible, how then, would such writes that are missing from the leader are ever going to be present in the leader and replicas other than node B?

3- And wouldn't this pose the risk of a client not being able to read those later logs (that are only in node B), e.g. if node B dies?

berfubuyukoz

I thought 2PC was to guarantee atomicity when making transactions in sharded databases? Why do we need 2PC in this case when we're talking about replicas not shards. We're just updating the stale logs, even if an update fails with a replica, it shouldn't affect the other replicas right? Shouldn't the leader just asynchronously retry to write/update the follower's logs if it fails?

And another point, is the 2PC mentioned here as such?
1st phase: request a commit to update a follower's log, and if the prefix is identical, YES will be responded
2nd phase: when there is a quorum of YES-es, commit to all followers/replicas

aloha

The most complicated parts of leader election are the failure cases and subsequent recovery. I'm no expert and I've always wanted to see a easy to grok video of how paxos/raft handle different failure cases and recover. For example, what happens if the leader goes down after sending the commit out to only a handful of nodes? I think a deeper dive in a future video would be really valuable, I'd definitely watch!

zalooooo

Hi Jordan, big fan, love your videos. I have a quick question about "leader should be the most up to date node". Let's say the old leader writes a write to half of the followers and then dies. And a follower who didn't receive this write, noticed that the leader is dead (which is more possible since he didn't receive this write and not have his Heartbeat timeout reset), this node starts the election, win (since he notice first) and become a leader. This node will backfill all the followers, and this write which reach more than half of the followers and considered as succeed, is lost. I feel this very likely will happen, how raft prevent this from happening?

潘雪松-fg

Hey Jordan, I am getting a bit confused when you mentioned that 2-phase commit can deal with heterogeneous writes. Can you help clarify what cross partition writes mean and what they are used for? I previously thought that means the same as maintaining global secondary index which we need to send same writes to different partitions, probably I get it wrong.

ariali

Distributed Consensus - Raft Writes | Systems Design Interview 0 to 1 with Ex-Google SWE

Distributed Consensus - Raft Writes | Systems Design Interview 0 to 1 with Ex-Google SWE

Distributed Consensus - Raft Leader Election | Systems Design Interview 0 to 1 with Ex-Google SWE

Understand RAFT without breaking your brain

Raft Algorithm Explained – Distributed Consensus Made Simple

The Raft Protocol explained via SQL database | CockroachDB | consensus protocol

Distributed Consensus Revised by Heidi Howard [PWLConf 2019]

Distributed Consensus in 15 Minutes! by Jim Webber

Distributed Consensus Making Impossible Possible - Heidi Howard - JOTB16

Paper #54. Paxos vs Raft: Have we reached consensus on distributed consensus?

Raft Consensus: Better database consistency

Paxos vs Raft: Have we reached consensus on distributed consensus? — Heidi Howard

RAFT in Blockchain Technology 🔥🔥

Heidi Howard — Liberating distributed consensus

SREcon17 Asia/Australia: Distributed Consensus Algorithms

Practical Distributed Consensus using HashiCorp/raft

Consensus Algorithm: Log Replication in Raft

Liberating Distributed Consensus - Heidi Howard

Distributed Consensus and Data Replication strategies on the server

Raft Consensus Algorithm | System Design | High Level Design

Distributed Systems 6.2: Raft

Understanding Distributed Consensus in etcd and Kubernetes - Laura Frank, CloudBees

What Actually Is Distributed Consensus?​ | Distributed Consensus Series - Part 1

'Raft - The Understandable Distributed Protocol' by Ben Johnson (2013)

Cluster Consensus: when Aeron met Raft • Martin Thompson • GOTO 2018

What Actually Is Distributed Consensus? | Distributed Consensus Series - Part 1