The Byzantine Generals, malice as a model.

Lamport, Shostak and Pease give a name to the kind of failure no protocol of the 1970s knew how to handle: a node that not only stops, but actively lies. The paper proves it takes 3m+1 generals to tolerate m traitors — and frames the model that every blockchain consensus algorithm still negotiates with today.

Authors Leslie Lamport, Robert Shostak, Marshall Pease

Year 1982

Venue ACM TOPLAS

PDF →

TL;DR

A small force of imperial generals surrounds a city. They will win if all attack together, and lose if any retreat alone. They can only communicate by messenger — and some of them are traitors who will say anything to make the others lose. The paper proves that a deterministic agreement protocol exists if and only if more than two-thirds of the generals are loyal. The framing — that failures aren't just crashes but can be adversarial — became the dominant threat model for half of modern distributed systems.

The problem

By 1980 the distributed-systems community had a workable theory of crash failures: nodes stop sending messages, but they don't lie. Two-phase commit, the original Paxos sketches, viewstamped replication — all assume a node either responds correctly or doesn't respond at all. That assumption fails the moment you imagine a node that has been compromised, a memory bit-flip in a router, or a faulty CPU that produces wrong arithmetic.

Lamport had been working on aviation flight-control systems where this mattered concretely: a triplex computer voted on outputs, but what if one of the three computers wasn't just dead, but was producing plausible-but-wrong answers? The paper gives the failure mode a name (Byzantine, after the historical metaphor) and asks the precise question: how many honest participants do you need to overcome the worst possible adversary?

The key idea

The paper proves two complementary results. First, in a synchronous system using oral messages (unforgeable but unsigned), no protocol can tolerate m Byzantine generals unless there are at least 3m+1 in total. Second, with signed messages — cryptographic authentication — any number of traitors can be tolerated as long as a majority of the loyal generals can be reached.

The proof of the first result is one of the most-cited adversarial arguments in distributed computing. Consider three generals where one is a traitor. The traitor can tell two different stories to the two loyal generals, and from each loyal general's vantage point, the world is internally consistent — they have no way to distinguish the traitor's lie from a different loyal general's honest report. Generalise to m traitors and the same indistinguishability argument forces 3m+1.

The recursive protocol they describe — Oral Messages, OM(m) — handles m traitors in m+1 rounds, with each general relaying what they heard from every other. The communication cost is exponential in m, which is why practical systems use signatures or restricted threat models.

The geometric intuition. With 3m+1 nodes and m traitors, any two majorities of size 2m+1 overlap by at least m+1 loyal nodes. That overlap is the part of the system the adversary can't corrupt — it carries decisions from one majority to the next. The same overlap argument shows up in Paxos (with crash failures, m+1 out of 2m+1 suffices) and in every quorum-based protocol since.

Contributions

Named the threat model. Before this paper, "fault tolerance" tacitly meant "crash tolerance". By the late 1980s, "Byzantine fault tolerance" was the standard name for any protocol that survives malicious actors, and the failure model was clearly distinguished from omission, timing, and crash failures.

The 3m+1 lower bound. The bound holds for any deterministic synchronous protocol using oral messages. This is the result that forces blockchain protocols to require ⅔-supermajorities and is the reason Ethereum's Casper-FFG needs validators to stake against ⅓ of stake being malicious.

Signed messages change the bound. With cryptographic signatures, the recursion collapses and any majority of loyal nodes can agree. This is exactly the trade-off blockchain protocols exploit: trade off the cost of cryptography (signature verification, larger messages) against the cost of running more validators.

Interactive Consistency. The paper introduces the term as a generalisation of Byzantine Agreement — every honest node must agree on a vector of input values, one per general, even though some generals are traitors. This formulation makes the bound easier to prove and turns out to be the right primitive for state-machine replication.

Criticisms and limitations

The synchronous assumption is the paper's biggest limitation in practice. Synchronous means there's a known bound on message delay; if the bound is missed the protocol can be made unsafe by an adversary. Real networks are partially synchronous, which is why PBFT (Castro and Liskov, 1999) and later protocols spend most of their pages reasoning about timing.

The exponential message complexity of the oral-message protocol makes it impractical for more than a handful of nodes. The signed-message variant is much cheaper but assumes a working PKI — itself a non-trivial Byzantine subproblem.

Where it shows up today

Every blockchain consensus protocol — PBFT-style (Tendermint, Cosmos, Diem), Nakamoto-style (Bitcoin, Ethereum PoW), and BFT proof-of-stake (Ethereum 2, Solana, Aptos) — explicitly cites the 3m+1 bound when arguing that ⅓ of voting power must remain honest.

In non-blockchain settings, Byzantine fault tolerance shows up in safety-critical systems: aircraft flight controllers (Airbus, Boeing), submarine systems, the Voyager probes, and modern hyperscale storage that has to defend against single-event upsets in DRAM and switch fabrics. The Maelstrom paper (Hu et al, NSDI 2022) describes how Meta's storage layer uses BFT-style checksums to detect silent data corruption between datacentres.

Follow-up reading

Practical Byzantine Fault Tolerance — Castro & Liskov · 1999 · OSDI. The first BFT protocol that worked under partial synchrony with reasonable message complexity. Read after this one.
Bitcoin: A Peer-to-Peer Electronic Cash System — Nakamoto · 2008. A probabilistic Byzantine agreement protocol that doesn't use the explicit 3m+1 structure but obeys the same bound on adversarial fraction.
HotStuff: BFT Consensus in the Lens of Blockchain — Yin et al · 2019 · PODC. A modern PBFT-derived protocol with linear message complexity per round. Used in Diem.
The Part-Time Parliament — Lamport · 1998. The (non-Byzantine) Paxos paper that came after. Crash failures only, but the quorum-overlap intuition is the same.
Paxos Made Simple — Lamport · 2001. Annotated on Semicolony.

More annotated papers

Back to the papers index

Foundational distributed-systems and database papers, read and annotated.

← All papers

Found this useful?