04 / 05
Principle / 04

Consistency patterns

"Consistency" isn't one thing. It's a spectrum that runs from strong (every reader sees the latest write, always) down to eventual (readers converge given enough quiet time). Every database picks a default somewhere on it, and most hand you a knob to shift per-query. Knowing where Cassandra sits relative to Spanner, what DynamoDB's default reads actually promise, and how a quorum dials the trade is the basic vocabulary you need to make a storage decision.


The five bands

Weakest to strongest. Each band gives you everything the band below it does, plus one more guarantee.

  1. 01
    Weak consistency

    Reads can return any version that ever existed, including stale ones. The only promise is that writes eventually propagate. No promise about when, or in what order. Fine for caches with short TTLs, where serving stale beats serving nothing.

    Memcached, Redis (default), L2 caches
  2. 02
    Eventual consistency

    If writes stop, every replica eventually converges to the same value. Reads can return stale data for an unbounded period during a partition. In practice the useful refinement is bounded staleness ("eventual within 100 ms" or "within 1 s"). This is the default shape for most production NoSQL.

    DynamoDB (default), Cassandra (LOCAL_ONE), Riak
  3. 03
    Session / read-your-writes consistency

    A client always sees its own writes immediately. Other clients can still see stale data. Built using read pins (route reads to the replica you wrote to) or version-stamped cookies. This is the right default for most web apps — "I posted a comment and now I don't see it" is the single most common bug eventual consistency causes.

    MongoDB (default), Cassandra w/ sticky routing, Postgres w/ read-pinning
  4. 04
    Strong consistency (per key)

    A read always returns the latest committed write for that key. No ordering guarantee across different keys. Built with quorum (R + W > N), a single primary with synchronous replication, or 2PC at the storage layer. When most candidates say "strong consistency," this is what they mean.

    Cassandra (QUORUM), DynamoDB (consistent read), Postgres (sync replica)
  5. 05
    Linearisable / external consistency

    All operations, across all keys, see a single global order that matches real-time wall-clock ordering. The strongest guarantee on offer — it behaves like a single machine, just at planet scale. Every write goes through consensus (Paxos or Raft). The latency cost is real.

    Spanner (TrueTime + Paxos), CockroachDB (Raft), etcd, ZooKeeper

The spectrum, drawn out

The five bands above read as a ladder, but it helps to see them as one continuous line with a cost axis running underneath. Strong consistency lives at the left: every read sees the latest write, and the price is coordination on the write path. Eventual lives at the right: writes return as soon as one replica accepts them, and the price is that readers can lag. The named models in the middle are not arbitrary stops. Each one rules out a specific class of bug while giving back some of the availability and latency that strong consistency spends.

STRONGERWEAKERlinearizablesequentialcausalsessioneventualglobal real-time orderone total orderrelated ops orderedper-client onlyconverge eventuallywrite-path costeach step left rules out one more anomaly and costs more coordination
The same line, two ways: what each model guarantees on top, what it costs on the bottom.

Two endpoints, and a band of named middle. Most production systems live in that middle on purpose. They want to spend coordination only where a bug would actually hurt, and they want the cheaper guarantee everywhere else. The full formal hierarchy of these models, and the exact ordering relationships between them, is its own subject; see consistency models for the rigorous version.

The useful middle: session and causal guarantees

The endpoints are easy to describe and rarely what you want. Strong consistency makes every operation wait for agreement, which is more than a feed or a profile page needs. Pure eventual consistency makes no promise a user can feel, which is less than a logged-in session needs. The interesting work happens in the middle, where a handful of named guarantees each kill one specific anomaly.

Read-your-writes. A client always sees the effect of its own writes. This is the guarantee a user actually demands without knowing the name for it. "I posted the comment, the page reloaded, the comment is gone" is a read-your-writes violation, and it is the single most common complaint eventual consistency produces. You buy the guarantee cheaply: pin the client's reads to the replica it wrote to for a short window, or carry a write timestamp in a cookie and refuse to read from a replica that hasn't caught up to it.

Monotonic reads. Once a client has seen a value, it never sees an older one. Without this, a user can refresh and watch a comment appear, vanish, and reappear as successive reads bounce between replicas at different lag. Monotonic reads pin a client to a single replica, or track the highest version it has seen and serve only that version or newer. Pair it with read-your-writes and you have covered most of what a person perceives as "the app is consistent," without paying for global ordering.

Monotonic writes. A client's writes are applied in the order it issued them. Without it, an update and a later correction can land on a replica out of order, leaving the correction overwritten by the older value. This matters whenever a client issues a sequence of dependent writes and assumes they stick in order.

Causal consistency. Operations that are causally related — B was written after the client read A, or B was issued after A in the same session — are seen by everyone in that order. Unrelated operations may be reordered freely. This is the strongest guarantee you can keep while staying available during a partition, which is why it is the ceiling for systems that refuse to reject writes. It fixes the cross-user version of the reload bug: if Alice comments and Bob replies, nobody ever sees Bob's reply before Alice's comment, even though the two writes came from different clients.

Bounded staleness. Reads may lag, but never by more than a stated bound — "at most 100 ms behind" or "at most five versions stale." This is eventual consistency with a service-level promise stapled on, and it is often the honest description of what a read replica gives you. A bound turns "could be arbitrarily old" into a number you can reason about and put in a doc.

Why the middle exists. Read-your-writes, monotonic reads, and causal consistency are not weak versions of strong consistency. They are precise tools, each removing one anomaly a user can actually observe, while leaving the rest of the system free to run cheap and available. Most "we need strong consistency" requirements are really "we need read-your-writes."

Client-centric vs data-centric guarantees

The bands split into two families, and confusing them is a common interview slip. Data-centric guarantees describe the state of the data itself across all replicas: linearizable, sequential, and causal are statements about the global history every observer would agree on. Client-centric guarantees describe what a single client experiences across its own session: read-your-writes, monotonic reads, and monotonic writes say nothing about other clients, only about the consistency of one client's own view over time.

The distinction is practical. Client-centric guarantees are cheap because they need no agreement between replicas — they only need to route or version one client's requests correctly. A system can offer read-your-writes to every user with no cross-replica coordination at all, by keeping each user's reads pinned to a replica that holds their writes. Data-centric guarantees are expensive because they constrain what every replica may show every observer, which forces the replicas to agree. When a database advertises "session consistency" it is promising the cheap, client-centric kind; when it advertises "strong consistency" it is promising the expensive, data-centric kind. The gap between those two prices is the whole reason the middle of the spectrum is crowded.

Where the real systems sit

A single database can land in different bands depending on how you configure it. The default is what matters in most operational conversations, so that's what's shown here.

System Shape Default pattern Strength
Redis (default) cache weak / best-effort
weak
Async replication, no read repair. A reader on a replica may not see a write that has acked on the master.
DynamoDB (default reads) kv eventual
eventual
Reads can be stale by tens of milliseconds. Strongly-consistent reads available on request, at higher cost.
Cassandra (LOCAL_ONE) wide-col eventual
eventual
Tunable per query (ONE / QUORUM / ALL). LOCAL_ONE is fastest, eventually consistent.
MongoDB (default) document read-your-writes
session
A client always sees its own writes; other clients may briefly see stale data on secondaries.
Cassandra (QUORUM) wide-col strong (R + W > N)
strong (linearisable-per-key)
Quorum read on a quorum-written key sees the latest value. Higher latency, no global ordering.
Postgres (sync replica) rdbms strong
strong (linearisable-per-key)
Sync replication blocks the commit until the replica acks. Reads from any node see the same data.
CockroachDB rdbms linearisable
linearisable / external
Serialisable isolation via Raft consensus on every write. Strongest guarantees of any open-source SQL.
Spanner rdbms external (TrueTime)
linearisable / external
TrueTime + Paxos give "external consistency" — globally linearisable. The strongest commercial offering.

Tunable consistency: the R + W > N knob

The reason a single database can sit in two different bands is that quorum systems expose the trade as a knob. Data is replicated to N nodes. A write is acknowledged once W of them confirm it. A read collects answers from R of them and returns the newest. The whole behaviour turns on one inequality: when R + W > N, the set of nodes a read touches must overlap the set of nodes the last write touched by at least one node. That one shared node holds the latest value, so the read cannot miss it.

N = 5 W = 3 R = 3 → R + W = 6 > 5n1n2n3overlapn4n5W = {n1, n2, n3}R = {n3, n4, n5}the read quorum and the write quorum must share a node — n3 carries the latest value
R + W > N forces the read and write sets to overlap. The shared node guarantees the read sees the most recent write.

The same N gives you a dial, not a switch. Set W = N and R = 1 and writes are slow and fragile — every node must ack — but reads are fast and any single node can serve them. Flip it to W = 1 and R = N and writes fly while reads are heavy. The balanced choice, W = R = quorum (a majority), keeps both at majority cost and still satisfies the overlap. Drop below the line — say W = 1, R = 1 on N = 3 — and you have opted into eventual consistency: a read can hit two nodes that both missed the last write. That is exactly the difference between Cassandra at ONE and Cassandra at QUORUM, chosen per query.

Two cautions the inequality hides. First, R + W > N gives you strong reads per key, not linearizability across keys and not protection from concurrent writers — two clients writing the same key at the same time still need conflict resolution. Second, the overlap argument assumes the quorums are honest majorities; sloppy quorums and hinted handoff, which Dynamo-style systems use to stay available during a partition, can break the overlap and quietly return stale data. The mechanics of quorum sizing, sloppy quorums, and read repair are their own topic; see quorum reads & writes for the full treatment.

Three rules of thumb

  1. Pick the weakest band your product can survive. Every step up costs latency (extra coordination), availability (rejecting writes during partitions), or both. Running strong consistency on a workload that doesn't need it is paying twice — once on the read path, once on the write path.
  2. Pick consistency at the right surface. Most systems want strong consistency for some operations (commit a payment, decrement inventory) and eventual for others (load a profile picture, render a feed). Place the guarantee where it's needed, not at the storage default.
  3. Make the band visible to the application. "This endpoint returns eventually consistent reads" belongs in the API doc. Stripe explicitly marks which fields are real-time and which are eventually consistent. A lot of "consistency bugs" are really documentation bugs.

What each step actually rules out

Every step up the ladder closes off one more class of bug a user could see:

  • Weak → eventual. Permanent divergence is gone. Replicas eventually converge.
  • Eventual → session. "I wrote it but I don't see it" is gone. A client always sees its own writes.
  • Session → strong-per-key. Other clients seeing stale data is gone. Two clients reading the same key after a write get the same answer.
  • Strong-per-key → linearisable. Cross-key reordering is gone. Operations on different keys land in one global order. Matters when one operation depends on another (commit, then notify).

The bands nest. Linearisable also gives you session, eventual, and weak — it's just expensive to maintain. Pick the weakest band that closes off the bugs your product actually cares about.

Replication lag is where the anomalies come from

Every consistency anomaly has the same root cause: a write reaches one replica before another, and a read lands on the replica that hasn't caught up yet. The window between "write acked on the leader" and "write visible on this follower" is replication lag, and each named guarantee is a rule for what a read may do during that window. Watch one timeline and the bugs become concrete.

leaderfollowerwrite x=2x=2 arrivesreplication laglag window: a read here returns x=1 (stale)read at this instant hits the follower before x=2 lands
The lag window is the whole problem. Each consistency model is a different rule for what a read is allowed to return inside it.

Map the anomalies onto that window. A stale read is any read that lands in the window and returns the old value — harmless for a like count, a support ticket for a balance. The "my own write vanished" bug is the same read, but issued by the client that did the write; read-your-writes fixes it by keeping that client off the lagging follower. The flickering value — new, then old, then new — happens when consecutive reads bounce between a caught-up replica and a lagging one; monotonic reads fixes it by pinning the client to one replica. The cross-user ordering bug, where a reply shows up before the comment it answers, happens when two related writes replicate at different speeds; causal consistency fixes it by shipping the dependency along with the write. Every pattern is a targeted answer to one shape of lag.

Choosing per operation, not per database

The mistake is treating consistency as a database-wide setting. It is a per-operation decision, and the right question is always the same: what is the worst thing that happens if this particular read is stale? Run two operations through it and the answer falls out.

A like count. If the count reads 1,402 when the true value is 1,404, nothing breaks. Nobody notices, nobody is harmed, and the number self-corrects on the next refresh. Pay nothing for consistency here: serve it from a cache or a lagging replica, eventual is correct. Spending coordination on a like count is spending latency to prevent a non-problem.

A bank balance feeding a withdrawal. If the balance reads $100 when $90 has already been spent, the system authorizes an overdraft. The cost of a stale read is real money and a reconciliation headache. This operation needs strong consistency on the read that gates the decision — read from the leader, or use a quorum read, and accept the latency. The balance shown on a dashboard for information, by contrast, can be eventual; only the read that authorizes a state change needs the strong guarantee.

That split — strong where a read drives an irreversible decision, eventual everywhere a human is just looking — covers most real systems. It is also the deeper reason the availability versus consistency trade is not a single global choice. You spend consistency, and the availability and latency it costs, only on the handful of operations that can't survive a stale read, and you keep the cheap, available guarantee on everything else.

Reading the docs honestly

Database marketing copy is unreliable here. Three rules that help:

  • "Strongly consistent" without qualification usually means strong per key, no cross-key ordering. Cassandra QUORUM, DynamoDB consistent reads, Postgres sync replicas all sit here. None of them give you Spanner-style linearisability across keys.
  • ACID is about isolation inside a single database, not consistency across replicas. Single-node Postgres is ACID and gives no cross-replica guarantees because there are no replicas. Add async replication and it's still ACID, with weak cross-replica consistency.
  • Jepsen reports are ground truth. Kyle Kingsbury's Jepsen analyses stress-test what each database actually guarantees under partitions and load. Several headline-grabbing "strongly consistent" databases have failed Jepsen badly.

The consistency models, formally

The casual ladder above hides a richer hierarchy. The textbook order, strongest to weakest:

Strict serializability. Linearisability + serializability. Every transaction looks atomic and instantaneous, and the global order respects real time. Spanner gives you this with TrueTime. CockroachDB and YugabyteDB approximate it with HLCs. The strongest guarantee anyone ships in production.

Linearisability. Single-object operations appear to occur in a global real-time order. There exists some point in time between the operation's invocation and its response when the effect happened, and once it happens it is visible everywhere. The canonical definition is Herlihy & Wing (1990); the canonical implementation is Raft (read after a leader-confirmed commit).

Sequential consistency. All operations appear to occur in some total order, but that order does not have to respect real time. Cheaper than linearisability because the system has more freedom to reorder; surprisingly close to "what most programmers expect from a database" — most consistency bugs come from systems that are not even sequentially consistent.

Causal consistency. Operations that are causally related (B happened after A in the same client session, or B read a value written by A) appear in the right order. Unrelated operations may be reordered. Strong enough for most real applications, strictly weaker than sequential. Implemented by COPS, MongoDB's causal-consistency mode, some CRDT systems.

Read-your-writes (a session model). A client always sees its own writes. Sticky sessions to a primary or a "read-after-write" timestamp track. Eliminates the specific anomaly "I clicked Save and refreshing shows old data" — the single most user-visible consistency bug.

Monotonic reads (a session model). If a client reads a value, subsequent reads return that value or a later one — never an older one. Together with read-your-writes, gets you most of what users perceive as "consistent" without paying for linearisability.

Eventual. No order guarantees; replicas converge given enough time and no further writes. The default for most NoSQL stores. Surprisingly hard to build correctness on top of without one of the stronger session models stapled on.

The big picture. Sequential and linearisable differ in whether real-time order matters. Causal weakens that to "only related operations". Session models drop cross-client ordering entirely. Eventual drops ordering. Each step down is cheaper to implement; each step up rules out more user-visible bugs.

Consensus is how strong consistency gets built

Anything stronger than "session" requires consensus among replicas. Some node has to decide what the global order is, and every replica has to agree. Three families ship in production:

Paxos (and Multi-Paxos). Original consensus algorithm, hard to implement correctly, used by Google (Chubby, Spanner), Microsoft (Cosmos DB), and many internal systems. Variants like EPaxos (egalitarian — no leader) and Flexible Paxos (configurable quorum sizes) are research-flavoured but show up in real systems.

Raft. Designed to be understandable. Strong-leader; followers replicate a log of operations the leader has committed. Used by etcd, Consul, CockroachDB, TiKV, MongoDB (newer versions), and dozens of others. The de-facto default for new implementations.

Viewstamped Replication, ZAB, and the rest. Several alternatives with small variations. ZAB powers ZooKeeper; Viewstamped Replication predates Paxos but is rarely used directly. Most are protocol-level equivalents to Raft or Paxos with different optimisations.

All three pay the same fundamental cost: every committed operation requires a majority of replicas to acknowledge it before the client gets a response. At least 2 RTTs on commit, in the typical case, sometimes more on failover. That is the latency floor for linearisable systems.

CRDTs — the eventual-consistency escape hatch

Conflict-free Replicated Data Types are data structures designed so that concurrent updates to different replicas can be merged automatically without conflict. The classic example: a grow-only set (G-Set). Two replicas independently add elements; merge takes the union; convergence is automatic.

Richer types: PN-counters (increment and decrement), OR-sets (with remove that handles concurrent add+remove correctly), LWW-Registers (last-writer-wins with timestamps), RGAs (Replicated Growable Arrays — for sequence types like collaborative text editing), Yjs/Automerge (production CRDT libraries for collaborative editing).

What CRDTs buy you: eventual consistency with mathematically guaranteed convergence, no coordination required, works across partitions and offline. Figma's multiplayer cursors, Linear's offline-first sync, every modern collaborative editor uses some form of CRDT.

What CRDTs cost: limited data model (must be expressible as a CRDT), metadata overhead (some CRDTs require keeping per-key version vectors that grow over time), and the intuition gap — programmers reason about transactional updates, not about commutative merge functions. The Operational Transformation crowd (Google Docs) and the CRDT crowd are still arguing about which is the better answer for collaborative text.

Mixing bands in one system — the common shape

Real systems almost never use a single consistency model end-to-end. A typical consumer-scale architecture mixes:

Strong consistency on the write path for money-shaped operations. Payments, balance updates, inventory decrement. Spanner / CockroachDB / Postgres-with- synchronous-replication. The latency cost is paid only on these operations.

Session consistency for user-owned state. User profile, settings, cart. Read-your-writes is the user-perceived requirement; sticky sessions or read-after-write timestamps deliver it cheaply.

Eventual consistency for shared/cached state. Feed timelines, search indexes, recommendation outputs, leaderboards. Async replication, CDN caches, materialized views. Latency is unbounded by consistency cost; the application tolerates staleness explicitly.

CRDT for collaborative state. Real-time collaboration, multiplayer, multi-device sync. Yjs or Automerge in the application layer; CRDT semantics on the wire.

The architecture diagram has different boxes labelled with different consistency guarantees. The interview signal is being able to name which guarantee applies where — and why each one is the right pick for that surface.

The anomalies eventual consistency lets through

A short bestiary of the bugs that eventual consistency permits and that working systems occasionally hit:

Stale reads. Client reads a value, server replies with an old version because the read hit a lagging replica. Mitigation: read-from-leader, read-your-writes session token, or accept and design the UI around staleness.

Lost updates. Two clients read the same value, both modify, both write back. Without conflict resolution, the second write silently overwrites the first. Mitigation: optimistic concurrency control with version numbers, CRDT-shaped updates, or strong consistency.

Write skew. Two transactions read overlapping data, both verify a constraint, both write changes that together violate the constraint. The on-call doctor example: both check "at least one other doctor is on duty", both confirm yes, both go off-duty, nobody is on call. Snapshot isolation permits this; serializable does not. Mitigation: SSI (Serializable Snapshot Isolation) or pessimistic locking on the constraint.

Concurrent updates that diverge. Multi-master systems can produce states that are inconsistent across replicas. Cassandra's "last-write-wins" with wall-clock timestamps will sometimes lose writes when clocks drift; Riak's sibling- resolution surfaces conflicts to the application. Mitigation: pick a database whose conflict resolution matches your data, or use CRDTs.

Cross-key inconsistency. Two related keys updated together, one update arrives at the replica, the other does not yet — the database reflects a state that never existed in the application's logic. Foreign-key relationships hit this regularly in NoSQL stores. Mitigation: model atomic operations as single-key updates (one row per logical entity), or use a database with multi-key transactions.

The pattern that catches almost every consistency bug

Ask, for every read your application performs:

1. What is the worst stale value the user could see here? If the answer is "anything they wrote in the last 100ms might not show up", that is a different bug from "they might see data from 30 seconds ago". Quantify the stale window.

2. What is the worst inconsistency between two reads? If a user reads their bank balance, then reads their transaction list, can those two views disagree? By how much, for how long?

3. What happens to the user if the staleness shows? A blurred profile photo loading a beat late is invisible. A "balance: $0" loading a beat late produces a support ticket. The user-visible cost of staleness is the budget you have for relaxing consistency.

Answering these three questions per surface usually settles the consistency choice faster than any abstract argument about CAP or PACELC. The right answer is the one whose worst case the user can survive.

Related on Semicolony

Found this useful?