Search tools and guides ⌘K

Multi-page · for platform engineers

Distributed systems topics

Distributed systems topics

Replication, consensus, time, CAP, consistency models, delivery semantics, quorum, leases, failure detection, gossip, CRDTs, sharding, service discovery, snapshots, streaming. Twenty topics that every distributed system has to take a position on. Each sub-page below works through one: the algorithm, the proof when there is one, and what it means for code that runs on more than one machine.

Twenty sub-pages. Each is a long-form walkthrough with papers, code, and the references the field actually uses: Lamport, Lynch, Brewer, Abadi, Shapiro, Kleppmann.

Deep dives

The twenty topics

Replication

Single-leader, multi-leader, leaderless. Synchronous vs asynchronous, log shipping vs row-based, the read-your-writes guarantee. The shape every replicated database has to choose.

leader/follower ·sync vs async ·replication lag ·split-brain ·read-your-writes

Consensus — Paxos & Raft

Why agreement is hard, why FLP says it's impossible, and why we ship Paxos and Raft anyway. Single-decree Paxos, Multi-Paxos, the Raft state machine, joint consensus.

FLP ·Paxos ·Raft ·leader election ·log replication

Time & clocks

Wall clocks lie. Lamport clocks order events, vector clocks detect concurrency, hybrid logical clocks combine both, TrueTime bounds uncertainty. The arithmetic of "happens-before".

wall clock ·Lamport clocks ·vector clocks ·HLC ·TrueTime

CAP & PACELC

Brewer's theorem in plain words, Abadi's PACELC refinement, and why every database vendor's marketing copy gets this wrong. What you actually trade away — and when.

CAP ·PACELC ·partition tolerance ·consistency ·latency tradeoff

Consistency models

The hierarchy readers actually care about. Linearizability, sequential consistency, causal consistency, eventual consistency — what each one promises, what it forbids, and why "Serializable" in your SQL database is a different axis entirely.

linearizability ·sequential ·causal ·eventual ·isolation vs consistency

Idempotence at scale

At-least-once delivery is the rule, exactly-once is a marketing term. How idempotency keys, deduplication windows, and the outbox pattern give you exactly-once semantics on top.

at-least-once ·idempotency keys ·outbox pattern ·dedup window ·effectively-once

Delivery semantics

At-most-once, at-least-once, exactly-once. Why true exactly-once delivery is impossible over a lossy network, and how producers, brokers, and consumers combine dedup, idempotent writes, and transactions to fake it convincingly.

at-most-once ·at-least-once ·exactly-once ·dedup ·transactional outbox

Quorums & majorities

Why N/2+1, why Dynamo-style W+R>N, why flexible Paxos lets you pick. The math behind how many nodes have to say yes before you can make progress.

majority quorum ·N/W/R ·Dynamo ·flexible Paxos ·witness replicas

Leases & fencing

Locks across the network are a lie unless you fence them. Why leases beat distributed locks, how monotonic fencing tokens save you from pause-the-world GCs.

leases ·fencing tokens ·distributed locks ·GC pauses ·lease renewal

Leader election

How a cluster picks one node to make decisions. Bully, Raft elections, ZooKeeper sequential nodes, etcd leases. Split-brain, fencing tokens, and the subtleties of "I think I am still the leader".

Raft election ·ZooKeeper ·leases ·fencing ·split brain

Failure detectors

Heartbeats are the cheapest answer and rarely the right one. Chandra-Toueg classes, phi-accrual (Cassandra), SWIM (Serf, Consul), and the difference between "I think you are dead" and "the system has agreed you are dead".

heartbeats ·phi-accrual ·SWIM ·Chandra-Toueg ·adaptive timeouts

Gossip protocols

Membership, failure detection, anti-entropy, SWIM. How clusters of thousands learn who is alive without a coordinator and without bringing the network down.

SWIM ·phi accrual ·anti-entropy ·epidemic broadcast ·membership

Anti-entropy & read repair

How leaderless stores heal divergent replicas. Read repair on the request path, hinted handoff while a node is down, and Merkle-tree anti-entropy that compares whole key ranges without shipping all the data.

read repair ·hinted handoff ·Merkle trees ·anti-entropy ·convergence

CRDTs

Conflict-free replicated data types. The algebra of merge — join-semilattices where the merge is associative, commutative, and idempotent — so replicas converge without coordination. State-based vs op-based, counters, sets, sequences, and what collaborative editors run.

semilattice ·CvRDT vs CmRDT ·G-Counter / PN-Counter ·OR-Set ·sequence CRDTs

Sharding & partitioning

Range, hash, consistent-hash, rendezvous. The four ways to split data across machines, the trade-offs each makes, and why resharding is the hardest operations problem in this list.

range partitioning ·consistent hash ·rendezvous ·hot shards ·resharding

Service discovery

How a service finds the live instances of another. Client-side vs server-side discovery, the registry (Consul, etcd, ZooKeeper, Eureka), health checks, DNS vs a control plane, and why stale registries cause more outages than dead nodes.

registry ·health checks ·client vs server-side ·DNS / control plane ·TTL & staleness

Two-phase commit & sagas

Atomic commit across services. The blocking problem in 2PC, the compensation problem in sagas, and why every microservices system eventually picks one. XA, the outbox pattern, choreography vs orchestration.

2PC ·XA ·sagas ·compensation ·outbox

Distributed snapshots

A consistent cut across a running distributed system. Chandy-Lamport with marker tokens, Lai-Yang, Mattern, and the asynchronous-barrier variant Flink uses to checkpoint stateful streams without stopping the world.

Chandy-Lamport ·markers ·Flink ABS ·consistent cut ·global state

Streaming systems

Why streaming is not just fast batch. Event time vs processing time, watermarks, the four windows, exactly-once as state semantics, and what Flink, Kafka Streams, and Spark Structured Streaming each chose when they sat down to ship.

event time ·watermarks ·windows ·exactly-once ·Flink / Kafka Streams / Spark

Back-pressure, retries, hedging, deadlines

The four reliability primitives as a unit. Back-pressure for capacity, retries for transient failure, hedging for tail latency, deadlines for budget. The single most common L5 system-design losing topic — usually because candidates handle one of the four and forget the other three.

back-pressure ·retry budget ·hedging ·deadline propagation