Consensus
Also known as:
Distributed Consensus
Agreement Protocol
Definition
Consensus is a way for a group of computers to agree on the same value, even if some of them are slow, crashed, or cut off from the network. It is the basis for things like leader election, replicated logs, and distributed locks. Without consensus, a cluster cannot behave like one reliable system.
Key Takeaways
- Consensus turns a group of unreliable machines into one reliable system.
- A correct algorithm has two jobs. It must never let two nodes agree on different values, and it should eventually reach a decision when the network is healthy.
- Most systems do not write their own consensus. They use Paxos or Raft inside a small store like etcd or ZooKeeper.
- Consensus is slow because every decision needs a round trip to a quorum. Use it for control plane work, not every write.
How It Works
- A node proposes a value, like the next entry in a replicated log or the name of the new leader.
- Other nodes vote on the proposal using the rules of Paxos, Raft, or a similar protocol.
- Once a majority quorum agrees, the value is locked in and cannot change.
- Slow or failed nodes catch up later by replaying the log of agreed decisions.
Where It Is Used
- etcd, Consul, and CockroachDB use Raft to keep their replicas in sync.
- Google Chubby and Spanner use Paxos for locking and global transactions.
- ZooKeeper uses a Paxos-flavoured protocol called Zab to back HBase, Kafka, and many other systems.