Paper · 2011
Paper · Distributed systems · Replication

CRDTs, conflict-free by construction.

Shapiro, Preguica, Baquero, and Zawirski's 2011 technical report unified a decade of work on conflict-free replicated data types — data structures designed so that concurrent updates always merge cleanly, without coordination. The math behind every collaborative editor, multiplayer game, and offline-first app shipped since.

Shapiro, Preguica, Baquero, Zawirski · 2011 · INRIA TR 7506 · hal.inria.fr


TL;DR

You cannot have strong consistency, full availability, and partition tolerance at the same time — that is the CAP result, and every replicated system pays its cost somewhere. Shapiro and co-authors ask a narrower question: if we restrict the operations a replica can perform, are there data types whose concurrent updates merge deterministically with no coordination at all? The answer is yes, and the paper writes the theory down.

Two families fall out. State-based CRDTs (CvRDTs) ship their full state between replicas and merge using a function that is commutative, associative, and idempotent — formally, the states form a join-semilattice and the merge is the lattice's least upper bound. Operation-based CRDTs (CmRDTs) ship operations through a causally-ordered broadcast, and require the concurrent operations to commute. Either way the property you get is Strong Eventual Consistency: any two replicas that have observed the same set of updates are byte-for-byte identical, with no rollback, no conflict resolution, no leader.

The paper gives a unified vocabulary, a catalogue of types — G-Counter, PN-Counter, G-Set, 2P-Set, OR-Set, MV-Register, LWW-Register, sequences like RGA and Treedoc — and a proof technique for each. It is the document every CRDT library since cites in its first paragraph.

The problem

Strong consistency requires coordination, and coordination requires a majority of replicas to be reachable. That is fine when your replicas live in the same rack; it is painful when they live on three continents and a transatlantic cable just got cut by an anchor, and it is impossible when one of them is a phone in airplane mode that the user expects to keep working. Eventual consistency relaxes the coordination requirement, but it leaves a hard question dangling: when the partitions heal and updates flow again, how do you reconcile two replicas that disagree?

Dynamo's answer, four years earlier, was to keep all conflicting versions and hand them back to the application at read time, expecting the application to merge them. That works for a shopping cart — union the items — and is exhausting for almost everything else. The question Shapiro et al pose is sharper. Forget general reconciliation. Are there data types whose update operations are designed up front so that the merge is deterministic, automatic, and requires no coordination? If yes, replicas can accept writes locally, gossip them whenever they happen to be online, and converge.

The paper's claim is that such types exist, they cover a useful fraction of the data structures real systems care about (counters, sets, registers, maps, sequences), and they all satisfy a single formal property the authors name Strong Eventual Consistency. That property is what the paper is really selling.

The key idea

Two complementary models. In the state-based model, a replica holds some state s, applies updates locally, and periodically sends its whole state to another replica. The receiving replica computes s' = merge(s_local, s_received) and adopts s'. For this to converge regardless of network ordering, the merge function has to be commutative (it doesn't matter who sends to whom), associative (you can batch merges), and idempotent (re-receiving the same state is harmless). The mathematical name for a set with such an operation is a join-semilattice, and the convergent state is the least upper bound of all the states that have ever existed.

A G-Counter (grow-only counter) is the simplest example. Each replica keeps a vector of per-replica counts, increments only its own slot, and the value of the counter is the sum of the vector. Merge is element-wise max. Try it on paper: every concurrent increment sequence converges to the same vector, regardless of order, regardless of how many times a state is re-received. PN-Counter (positive-negative) is two G-Counters, one for increments and one for decrements.

In the operation-based model, each replica broadcasts its operations through a layer that guarantees causal delivery — if operation B was issued after the replica saw A, then every replica applies A before B. Concurrent operations (neither happened-before the other) can arrive in any order, so they must commute. The trade is bandwidth for complexity: ops are small, but you need a causal broadcast primitive underneath.

The bargain. CRDTs give you commutative merges without a coordinator — replicas converge no matter what order updates arrive in. The price: limited operations (no general delete, no general transaction), some metadata overhead (tombstones for delete, vector clocks for ordering), and certain semantics that feel weird to humans — an item you removed can reappear if a concurrent update from another replica added it back, because there is no global "now" against which to say the remove came after.

Strong Eventual Consistency is the formal payoff. It says: any two replicas that have received the same set of updates have the same state, full stop. Plain eventual consistency only promises that replicas converge given enough time and no more updates; SEC is a per-state guarantee that holds at every point. That is the property the paper proves for each type in its catalogue.

Contributions

The field existed before this paper — Wuu and Bernstein's log merging from the 1980s, Johnson and Thomas's last-writer-wins from 1976, the Treedoc and Logoot sequence types from the late 2000s. What the report does is unify them. It gives a single vocabulary (state-based vs operation-based, the CvRDT and CmRDT acronyms, the join-semilattice framing), a single property to aim for (SEC), and a catalogue of types proved correct against it.

The catalogue is the part practitioners cite most. G-Counter and PN-Counter for numeric aggregates. G-Set, 2P-Set, OR-Set (observed-remove), LWW-Element-Set for collections. LWW-Register and MV-Register for single-valued cells. And the sequence CRDTs — Replicated Growable Array, Treedoc, Logoot — for the collaborative-editing use case. Each comes with its semantics, its merge function or operation set, and a proof sketch that the type satisfies SEC.

Two more contributions are worth calling out. The proof technique itself — show that state-based types' merges form a join-semilattice, show that op-based types' concurrent operations commute — became the template every subsequent CRDT design followed. And the explicit separation of the two models clarified a debate that had been mostly implicit: state-based is simpler to reason about but heavier on the wire; op-based is lighter but needs causal broadcast infrastructure underneath.

Criticisms and limitations

The most cited weakness is tombstones. Removing an element from an OR-Set doesn't delete it; it adds a tombstone marking the unique-id of the element as removed. Tombstones accumulate forever in the basic design, and any real system has to garbage collect them — Riak, Cassandra, and every production CRDT store has a tombstone-reaping pass, usually backed by some global epoch or by waiting for a stability threshold that itself requires coordination. The paper acknowledges the issue but doesn't solve it.

Sequence CRDTs (the kind a collaborative text editor needs) are notoriously hard. The ones in the paper — RGA, Treedoc, Logoot — work, but each has surprising performance edges: Logoot positions can grow unboundedly under adversarial editing, Treedoc's tree can become deeply unbalanced, RGA needs careful tombstone handling. Modern editors (Yjs, Automerge) use newer constructions like RGA-Split and FugueMax that this paper doesn't cover, designed to address exactly the practical problems the 2011 designs ran into in production.

The headline "no coordination" claim is also conditional. Applying an op-based CRDT update needs causal delivery, and causal delivery is itself a form of coordination — it requires vector clocks or version vectors that grow with the number of replicas, and it requires a broadcast layer that knows the membership. The paper is careful about this in the body, but readers who skim the abstract often miss it.

Finally, the state-based model burns a lot of bandwidth: every sync ships the whole state, even when only one increment happened. Delta-CRDTs (Almeida, Shapiro, and Baquero, 2018) ship just the changes since the last sync and recover the bandwidth profile of op-based CRDTs while keeping the state-based simplicity. Most modern CRDT implementations are delta-based; the 2011 paper precedes that fix.

Where it shows up today

Riak (Basho, then post-Basho) shipped a first-class CRDT library — counters, sets, maps, registers — and made them the recommended way to store mutable data on a multi-master cluster. Redis Enterprise ships a CRDT layer (Redis CRDB) that does the same for the Redis data types, letting active-active geo-replicated deployments accept writes everywhere. Cassandra and ScyllaDB don't expose CRDTs as a public type system, but their internal counter columns and lightweight transactions borrow heavily from the same designs.

Collaborative-editing libraries are where the ideas have travelled furthest. Automerge (Ink and Switch, Martin Kleppmann's group) is a JavaScript CRDT library specifically for offline-first apps; Yjs is the production-grade alternative that ships inside Quill, ProseMirror, JupyterLab's collaboration mode, and a long tail of editor integrations. Apple Notes uses CRDTs under the hood for cross-device sync, and Figma's multiplayer model is CRDT-inspired (though Figma's published descriptions emphasise that they use an OT-style approach for the document tree and CRDT-style approaches for certain sub-structures, so it is a hybrid rather than a textbook CRDT).

The contrasting design is Operational Transformation, which Google Docs uses. OT achieves the same goal — concurrent edits converge without a coordinator — but through a different mechanism: each replica transforms incoming operations against its local history before applying them. The CRDT-vs-OT comparison is the single most-discussed topic in the collaborative editing space, and Kleppmann's blog series is the canonical place to learn the trade-offs. The short version: OT needs a central server to linearise operations in practice; CRDTs don't, but they pay in metadata.

Follow-up reading

Found this useful?