Distributed systems · 1985
Paper · Distributed systems · Theory

FLP, the impossibility result.

Fischer, Lynch, and Paterson proved in 1985 that no deterministic consensus protocol can guarantee termination in an asynchronous network with even one faulty node. It's the boundary every real consensus algorithm has to negotiate — and explains why every production system relies on timeouts, randomness, or partial synchrony.

AuthorsMichael J. Fischer, Nancy A. Lynch, Michael S. Paterson
Year1985
VenueJournal of the ACM, Vol. 32, No. 2

TL;DR

In a purely asynchronous message-passing system — no clocks, no bounds on message delay — no deterministic protocol can solve consensus if even a single process is allowed to crash. There always exists an execution in which the algorithm runs forever without deciding.

The proof is short (about ten pages) and constructive: it builds an adversary scheduler that keeps the system stuck in a configuration where the outcome is undetermined. The result is not "consensus is impossible". It is "deterministic termination in pure asynchrony with crash failure is impossible". Every real consensus algorithm — Paxos, Raft, PBFT, Tendermint, Nakamoto — works by weakening one of those three words.

Why this paper matters. FLP redrew the field. Before it, people argued about whether asynchronous consensus was hard; after it, every new protocol had to state, explicitly, which assumption it relied on to escape the result. It turned consensus from an engineering problem into a problem of identifying minimum sufficient assumptions.

The problem

The setting is an asynchronous distributed system. n processes communicate by sending messages over a reliable channel. Messages eventually arrive, but with no upper bound on delay. There are no clocks, no timers, no way to tell elapsed time. Processes are deterministic — given the same state and the same incoming message, they make the same step.

Each process starts with an input bit (0 or 1). At most one process may crash — it stops sending and receiving for the rest of the execution, with no warning. A consensus protocol must satisfy three properties:

  • Agreement. No two non-faulty processes decide differently.
  • Validity. If a process decides v, then v was the input of some process. Rules out the trivial "always decide 0" protocol.
  • Termination. Every non-faulty process eventually decides.

The question is whether a deterministic protocol exists that satisfies all three. FLP says: no. You can have agreement and validity, but in the worst case you cannot guarantee termination.

The model is the point. "Asynchronous" here is mathematically pure: not "slow", not "occasionally delayed", but with literally no bound on delay. The model is the strongest adversary you can write down. The result tells you what no protocol — however cleverly designed — can do against that adversary.

The key idea

The proof centres on a notion called bivalence. A configuration of the system (the joint state of every process plus all in-flight messages) is:

  • 0-valent if every execution from it eventually decides 0.
  • 1-valent if every execution from it eventually decides 1.
  • Bivalent if both outcomes are still reachable.

Assume a protocol exists that always terminates. The authors show two things:

  • Lemma 1 — a bivalent initial configuration exists. Line up initial configurations by flipping one input bit at a time, from all-0 to all-1. The all-0 config must be 0-valent (by validity); all-1 must be 1-valent. Somewhere along that line two adjacent configurations must straddle the decision. Crashing the one process whose bit differs between them makes them indistinguishable to every other process — so they must lead to the same decision. Contradiction unless at least one of them was bivalent.
  • Lemma 3 — from any bivalent configuration, there's always a step to another bivalent configuration. Suppose not: every step from some bivalent C goes to a univalent configuration. By a careful case analysis on the order in which the next message is delivered, the authors construct two reachable configurations that differ only by the action of a single process — and then crashing that process collapses them to the same view, forcing the same decision. Contradiction again.

Put the two together: there's an initial bivalent configuration, and from any bivalent configuration the adversary can pick a next step that stays bivalent. The adversary repeats this forever. The protocol never terminates.

Why this isn't doomsday. FLP rules out deterministic protocols in purely asynchronous networks with even one crash. Real systems use one of three escape hatches: timeouts (partial synchrony — Dwork, Lynch, Stockmeyer 1988, the assumption that after some unknown "global stabilisation time" message delays are bounded), randomness (Ben-Or 1983, where each process flips a coin to escape the bivalent configuration with probability 1), or failure detectors (Chandra, Toueg 1996, where you assume the network gives you an oracle — possibly unreliable — that eventually identifies crashed processes). Paxos and Raft live in the partial-synchrony world. Bitcoin lives in the randomness world (proof-of-work is the coin flip). PBFT-style protocols mix partial synchrony with view changes that re-bootstrap after periods of asynchrony.

The adversary's only real power is delaying messages. There is no clock to distinguish "this node is slow" from "this node has crashed". One slow node looks identical to one crashed node from every other process's perspective, and the deterministic algorithm has no way to tell them apart. That single indistinguishability is the whole proof.

Contributions

  • The formal model. A clean, minimal model of asynchronous message-passing with crash failures. The model itself — processes, configurations, schedules, events — became the standard vocabulary for the next four decades of distributed-systems theory.
  • The bivalence proof technique. The valency argument has been reused to prove many other impossibility results: k-set agreement (Borowsky-Gafni, Herlihy-Shavit), wait-free shared-memory consensus, renaming, the asynchronous computability theorem. If you read distributed-systems theory, you keep meeting this proof structure.
  • It drew the boundary. Subsequent work had to state, explicitly, what assumption it added on top of pure asynchrony. "We assume partial synchrony" or "we assume an eventually perfect failure detector" or "the algorithm is randomised" became required preamble for any consensus paper.
  • It reframed the question. Before FLP: "can we do consensus?" After FLP: "what is the minimum extra assumption that lets us do consensus?" This shift produced the partial-synchrony hierarchy (Dwork-Lynch-Stockmeyer), the failure-detector hierarchy (Chandra-Toueg), and the randomised-consensus line of work.

Criticisms and limitations

The result is sharp and well-bounded, but it is also often misread. The honest limitations are:

  • The model is pure asynchrony. Real networks don't look like this. Datacentre networks have median round trips under a millisecond and rarely exceed a few seconds even under pathological conditions. They behave like partial synchrony: bounded most of the time, occasionally not. So FLP is a worst-case statement, and real systems live nowhere near the worst case.
  • The result is binary. "Impossible" doesn't quantify how impossible. Later work (Aspnes's 2003 survey of randomised consensus, the probabilistic-termination literature) showed that with randomness you can get expected-O(log n) rounds. The gap between "impossible deterministically" and "easy probabilistically" is enormous, and FLP doesn't characterise it.
  • Frequently misread. "Consensus is impossible" is the wrong takeaway; the right one is "deterministic termination is impossible in pure asynchrony with crash failure". Agreement and validity are perfectly achievable. It is liveness — and only liveness — that the result targets.
  • Crash failure only. The paper addresses crash-stop processes, not Byzantine ones. The Byzantine generalisation came later (Lamport, Shostak, Pease 1982 had already shown the lower bound on the number of processes; the FLP-style impossibility for Byzantine asynchrony follows by reduction).

Where it shows up today

Every production consensus protocol can be read as a specific answer to "which FLP assumption did you weaken?".

SystemHow it escapes FLP
PaxosPartial synchrony. Liveness depends on a stable leader; if the network is too asynchronous, leader election thrashes and the protocol stalls (safe but not live). Lamport says so directly in "Paxos Made Simple".
RaftSame — randomised election timeouts plus partial synchrony. The randomisation is to avoid split votes, not to escape FLP; the partial-synchrony assumption is what guarantees eventual progress.
PBFTPartial synchrony plus explicit view changes. The view-change mechanism is the way PBFT recovers from periods of FLP-style asynchrony by rotating to a new primary.
Tendermint, HotStuffPartial synchrony with leader rotation. Both cite FLP directly and design the protocol around the partial-synchrony assumption. HotStuff's chained pipelining makes the cost of view change linear.
Bitcoin (Nakamoto)Randomness. Proof-of-work is the coin flip; the longest-chain rule plus a probabilistic finality model sidesteps deterministic agreement entirely. Probability of disagreement decays exponentially in confirmation depth.
Ben-Or, Bracha (randomised BFT)Explicit randomness via local coin flips. Expected termination in O(2n) rounds for Ben-Or, polynomial for later variants. The probabilistic line of FLP escape hatches.
etcd, Consul, ZooKeeperProduction Paxos/Raft variants. The partial-synchrony assumption is what their election-timeout configuration is encoding. If you set timeouts too aggressively for the network, you get availability problems; too loosely and failover is slow.

A useful exercise: when reading any consensus protocol, find the place where it assumes a timeout, a coin flip, or a failure detector. That is the line where the protocol pays its FLP tax. There is no protocol — and there will never be a protocol — that does not have such a line.

Follow-up reading

Found this useful?