Raft Consensus Simulator
Five nodes. One leader. Click any node to partition it from the network and watch the cluster react — a follower will time out, become a candidate, request votes, and (if it reaches majority) become the new leader. The log replicates from leader to followers every few ticks.
Five clickable cards, one per node, ticking on their own every 700 ms. Each card shows its role, its term, the draining bar that is its election timer, and the replicated log at the bottom. The leader card is tinted in the accent colour; a candidate flashes the highlight colour mid-election. The counters up top track the global tick, the current term, and which node holds leadership. Clicking a node partitions it; clicking again heals it.
Hit "Kill the leader" and watch what follows: with no heartbeats arriving, one follower's timer drains to zero, it jumps to candidate, bumps its term, and collects votes until it crosses three of five. A new leader appears and the logs resync. The number that should surprise you is the term: it only ever climbs, never resets, and every node snaps to the highest term it hears. That monotonic counter is what stops a healed old leader from fighting the new one. Now partition three of five at once and the cluster freezes; no candidate can reach majority until you heal one.
Why consensus is hard
Five computers, one decision.
Distributed consensus is the problem of getting a set of computers to agree on a value when (1) any subset of them can fail at any moment, (2) messages between them can be lost, reordered, or duplicated, and (3) network partitions can isolate one half from the other. The "value" they're agreeing on is usually a log entry — "user 42 paid $100", "this row was deleted", "this key is now mapped to that value". If different nodes apply different log entries in different orders, they diverge, and the system is no longer a single logical system.
Lamport's Paxos (1989, published 1998) was the first provably-correct consensus algorithm. It was also so notoriously hard to understand and implement correctly that for fifteen years almost every production system that needed consensus either rolled its own (badly) or wrapped a battle-tested library at a great distance from the actual protocol.
Raft (Diego Ongaro and John Ousterhout, 2014) was designed explicitly for understandability. It's not faster than Paxos and not measurably more reliable in production — but a working engineer can read the paper, build an implementation, and reason about edge cases in a week. The result: by 2020, every major modern distributed system that needs consensus has used Raft — etcd, Consul, CockroachDB, TiKV, RethinkDB, MongoDB (since 3.2 replica sets), and the dozens of Kubernetes etcd clusters they all sit on top of.
Follower, candidate, leader
One state machine, deterministically run by every node.
Every Raft node is always in exactly one of three states. The transitions between them, and the messages that drive them, are all of Raft.
- Follower. Passive. Responds to AppendEntries from the leader (which both replicate log entries and serve as heartbeats) and RequestVote from candidates. If the election timer expires without a heartbeat — say 150-300 ms — the follower becomes a candidate.
- Candidate. Increments its term, votes for itself, and broadcasts RequestVote to every peer. If it receives votes from a majority of nodes — including itself — it becomes leader. If it receives an AppendEntries from a leader with a term ≥ its own, it reverts to follower. If the election times out (no majority, no new leader detected), it starts a new election with a higher term.
- Leader. Sends AppendEntries to every follower at a regular interval (the heartbeat). When clients submit writes, the leader appends them to its log, replicates them to followers, and once a majority of followers have acknowledged the entry it's "committed" and applied to the state machine.
The cluster is always at most one term ahead of the slowest alive node. When a new election runs, every participant moves to the higher term — even nodes that didn't vote in it. This monotonic term counter is how Raft prevents "split brain": a leader from term 3 that comes back from a partition will see term 5 in the heartbeats from the new leader and step down immediately.
Production Raft, by name
Every modern consensus you've heard of.
| System | Where Raft sits |
|---|---|
| etcd | The Raft library is the entire purpose of etcd. Every Kubernetes cluster's state lives in an etcd Raft group. Three or five etcd nodes per cluster, deployed at every cloud-native shop. |
| Consul | HashiCorp's service-mesh control plane. Raft for service registration + KV store. |
| CockroachDB | Every 64 MiB "range" of the database is its own Raft group. A 100-node cluster runs tens of thousands of Raft groups in parallel; the leadership is distributed across nodes for load balancing. |
| TiKV / TiDB | Same architecture as CockroachDB — Raft per range. Foundation of PingCAP's open-source distributed SQL. |
| MongoDB | Replica set elections (since 3.2, 2015) use a Raft-derived protocol. Earlier versions used a bespoke election that had pathological edge cases. |
| Apache Kafka (KRaft mode) | Kafka's replacement for the ZooKeeper-based control plane (KIP-500, GA in 3.3 / 2022). Brokers run a Raft group for metadata. |
| RabbitMQ (quorum queues) | Raft-replicated message queues for strong delivery guarantees, added in 3.8 (2019). Replaces the older mirrored-queue mechanism. |
| Vault | HashiCorp's integrated storage backend uses Raft for HA without the ops cost of running Consul alongside. |
| Nomad | Raft for the scheduler's state. Every Nomad server is part of the same Raft group. |
etcd-io/raft
(Go). It's been in production for over a decade, has eaten every
imaginable edge case, and is the library CockroachDB, TiKV, and
Vault all build on. Writing your own Raft from scratch is a great
learning exercise; using it in production is rarely a good idea.References
- Diego Ongaro & John Ousterhout — In Search of an Understandable Consensus Algorithm (2014). The original Raft paper. 18 pages; the most readable consensus paper in the literature.
- Diego Ongaro — Consensus: Bridging Theory and Practice (2014 PhD thesis). The long-form treatment, including correctness proofs and snapshot / membership-change edge cases.
- raft.github.io — the original visualisation. Frozen in 2014 but still useful as a reference.
- Adjacent: consensus deep dive, leader election, Raft paper annotated, quorum read/write simulator.