05 / 14

Internals / 05

Isolation levels

Transactions never run alone. They interleave, and the order they interleave in can produce surprises: rows that vanish, totals that don't balance, two doctors going off-call at the same time. An isolation level is a contract about which surprises the database is allowed to show you. Here is the ANSI ladder, what it actually means, and what every major vendor really ships under each name.

Transactions don't run alone —
they interleave.

A database that ran one transaction at a time would be easy to reason about and far too slow. So engines run many transactions at once, interleaving their reads and writes against shared rows. The trouble is that some interleavings produce results no single-threaded run ever could: a row that exists in one query and not the next, an update that silently disappears, a global invariant that quietly breaks.

The textbook ideal is serializable: every concurrent schedule must produce the same result as some serial order of the same transactions. Serializable is expensive, so SQL gives you a knob, the isolation level, to trade correctness for throughput. Picking a level means deciding, ahead of time, which classes of anomaly you're willing to let your application see.

The anomalies, with concrete examples.

There are five anomalies worth knowing by name. Four come from the ANSI standard. The fifth, write skew, was named by Berenson & Gray in 1995 because the original list missed it.

Dirty read. T1 reads a row T2 has written but not yet committed. If T2 rolls back, T1 used a value that never existed. Example: a reporting job reads a balance of $0 because a transfer is mid-flight, the transfer aborts, and the report is wrong.
Non-repeatable read. T1 reads row R, T2 updates R and commits, T1 reads R again and sees a different value. The two reads inside one transaction disagree.
Phantom read. T1 reads a range (e.g. WHERE age > 30), T2 inserts a new row inside that range and commits, T1 re-runs the same query and sees a row that wasn't there before. A non-repeatable read on a set instead of a single row.
Lost update. T1 and T2 both read x = 10, both compute x + 1, both write 11. One of the two writes is silently overwritten, so a +1 evaporates. Classic on counter columns updated with read-modify-write.
Write skew. Two transactions each read a set of rows, each writes to a disjoint row, and the writes together break a constraint that neither write would break on its own. The canonical example below uses on-call doctors.

The ANSI SQL-92 ladder.

SQL-92 defines four isolation levels, each one preventing one more class of anomaly than the level below it. You set the level per transaction:

-- ANSI SQL-92 syntax. Most engines support some variant.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Postgres / MySQL: also accepted at session scope
BEGIN ISOLATION LEVEL SERIALIZABLE;
-- ... your statements ...
COMMIT;

The intended ladder, as the standard reads it:

Level	Dirty read	Non-repeatable	Phantom
Read Uncommitted	Possible	Possible	Possible
Read Committed	Prevented	Possible	Possible
Repeatable Read	Prevented	Prevented	Possible
Serializable	Prevented	Prevented	Prevented

What ANSI actually means is ambiguous.

Berenson, Bernstein, Gray, Melton, O'Neil & O'Neil's 1995 paper A Critique of ANSI SQL Isolation Levels showed that the ANSI definitions, taken literally, permit anomalies the standard's authors clearly didn't intend. The phenomena are defined as forbidden read/write sequences, not as the schedules they actually rule out.

Two consequences. First, the same level name on two vendors can prevent different anomalies. Second, snapshot isolation, the dominant implementation technique in MVCC engines, isn't on the ladder at all. By the ANSI text, SI looks like Serializable (it prevents the three named anomalies). By any reasonable definition, it isn't: it permits write skew.

Read the critique. Twenty-five pages, written for working engineers, and it reframes the whole topic. The "Snapshot Isolation" section is the part most people haven't read and most need to. See the paper walkthrough.

What every major DB actually delivers.

The names on the tin don't map cleanly to the standard. The table below shows what each engine actually does when you ask for a level by name.

Engine	Default level	What it really is
Postgres	Read Committed	RC via MVCC. "Repeatable Read" is snapshot isolation. "Serializable" is SSI (Cahill 2008).
MySQL / InnoDB	Repeatable Read	Closer to SI than to ANSI RR. Phantoms appear unless you take next-key locks with `SELECT ... FOR UPDATE`.
Oracle	Read Committed	RC via MVCC. "Serializable" is snapshot isolation — has been for 30 years. No true serializable level.
SQL Server	Read Committed (lock-based)	Full ANSI ladder via locks, plus optional Snapshot and Read Committed Snapshot.
CockroachDB	Serializable	Serializable by default, via its own MVCC + timestamp scheme (not Postgres-style SSI). READ COMMITTED added in v23.2.
Spanner	Serializable	Strict serializable — uses TrueTime to give external consistency.
FaunaDB	Serializable	Strict serializable by default.

Three engines, three different things behind the word "Serializable." Worth knowing before your next migration.

Snapshot isolation, in depth.

Snapshot isolation gives each transaction a consistent view of the database as it was the instant the transaction began. Reads never block, never see uncommitted data, and never see writes that happened after the start timestamp. Writes use first-committer-wins: if two transactions modify the same row, the second to commit is aborted. Built on MVCC; cheap, popular, and almost-but-not-quite serializable.

The classic counterexample is the on-call doctors. Constraint: at least one doctor must stay on call. Two doctors, Alice and Bob, are both on call. Both decide to go off call at the same time:

-- Invariant: COUNT(*) WHERE on_call = true MUST be >= 1.
-- Initial state: Alice and Bob are both on_call = true.

-- Transaction A (Alice)              -- Transaction B (Bob)
BEGIN;                                 BEGIN;
SELECT COUNT(*) FROM doctors           SELECT COUNT(*) FROM doctors
  WHERE on_call = true;                  WHERE on_call = true;
-- returns 2                           -- returns 2
UPDATE doctors SET on_call = false     UPDATE doctors SET on_call = false
  WHERE name = 'Alice';                  WHERE name = 'Bob';
COMMIT;                                COMMIT;

-- Both commit under SI. Both writes touched disjoint rows, so
-- first-committer-wins does not fire. Invariant: violated. No doctor
-- is on call. Under true serializable, one transaction would abort.

SI permits write skew. Any time your business invariant spans multiple rows and your transactions update disjoint rows after reading them, SI will eventually let both through. SELECT … FOR UPDATE on the read set, or escalate to Serializable.

Serializable Snapshot Isolation (SSI).

Cahill, Röhm & Fekete's 2008 paper "Serializable Isolation for Snapshot Databases" showed how to make SI actually serializable at modest cost. The insight: serializability under SI breaks only when a transaction graph contains two consecutive rw-antidependencies in a cycle: a read by T1 that conflicts with a write by T2, followed by a read by T2 that conflicts with a write by T3.

SSI tracks read and write sets per transaction at commit time, flags transactions that form a dangerous pattern, and aborts one of them. Postgres shipped this in 9.1 (2011) as the SERIALIZABLE level. It costs roughly a hash-table of read-locks per active transaction and an extra commit-time check — no lock contention, no row locks for readers.

The trade-off is that aborts are now part of the contract. Code that uses Serializable has to be ready to retry on a 40001 serialization_failure. Most ORMs don't retry by default; you write the loop yourself.

The performance cost.

On Postgres, switching a write-heavy OLTP workload from Read Committed to Serializable typically costs 10–30% throughput, depending on contention. Cahill's original paper measured ~10% on TPC-C. Real workloads with hot rows can see 2–10× degradation because the abort-and-retry loop eats CPU when contention is high.

Lock-based serializable (SQL Server's default SERIALIZABLE) is usually worse, since range locks block both readers and writers. MVCC-based SSI is the right baseline to compare against if you're considering escalation.

Read Committed is good enough surprisingly often. Most CRUD endpoints touch one or two rows, then commit. Lost updates are caught by a unique constraint or an optimistic WHERE version = ?. Save the strong level for the cases where you really do encode an invariant across multiple rows.

Choosing a level.

A practical rule that holds up:

Default to Read Committed. It's what Postgres, Oracle, and SQL Server default to, it's fast, and it's enough for most application code.
Use SELECT … FOR UPDATE on hot rows where two transactions might race on the same row (counters, inventory, balances). This puts a single row on pessimistic locking without changing the level globally.
Escalate to Serializable when business logic encodes invariants across multiple rows — bank-account-style transfers, on-call rotations, scheduling, double-entry bookkeeping. Wrap the call site in a retry loop for 40001 errors.
Snapshot isolation is fine for analytics: long-running read transactions that need a consistent point-in-time view of the database.

Common production mistakes.

Assuming the level name means the same thing across vendors. Migrating from MySQL Repeatable Read to Postgres Repeatable Read changes the semantics. Migrating to Postgres Serializable changes them again.
Forgetting that SI permits write skew. If your code reads a set of rows and updates a row based on that read, SI can let two such transactions both commit. Check whether the invariant survives concurrent runs, not whether it survives in isolation.
Long-running transactions under MVCC. A transaction that stays open for hours holds back vacuum / cleanup, so old row versions pile up. The Postgres bloat and the InnoDB undo-log explosion are the same disease: a forgotten BEGIN in a worker.
Counting on SERIALIZABLE in Oracle. Oracle's Serializable is snapshot isolation, not real serializability. Code that depends on serializability for correctness will silently break.
No retry loop around Serializable. Aborts are part of the contract; if your app surfaces 40001 to the user, you've shipped a bug, not a feature.

Isolation levels

Transactions don't run alone —
they interleave.

The anomalies, with concrete examples.

The ANSI SQL-92 ladder.

What ANSI actually means is ambiguous.

What every major DB actually delivers.

Snapshot isolation, in depth.

Serializable Snapshot Isolation (SSI).

The performance cost.

Choosing a level.

Common production mistakes.

Further reading.

06 — The query planner

Isolation levels

Transactions don't run alone —they interleave.

The anomalies, with concrete examples.

The ANSI SQL-92 ladder.

What ANSI actually means is ambiguous.

What every major DB actually delivers.

Snapshot isolation, in depth.

Serializable Snapshot Isolation (SSI).

The performance cost.

Choosing a level.

Common production mistakes.

Further reading.

06 — The query planner

Transactions don't run alone —
they interleave.