ARIES, how every database recovers.

ARIES is the paper that explains how a database survives a crash. It's long, dense, and reads like a working specification rather than a research paper — which it effectively was, for IBM's DB2. Nearly every relational database since (Postgres, MySQL InnoDB, SQL Server, Oracle, SAP HANA, even DuckDB) implements something very close to ARIES, with small tweaks.

Authors C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, Peter Schwarz

Year 1992

Venue ACM TODS

PDF →

TL;DR

ARIES is a write-ahead logging (WAL) and crash-recovery algorithm. Every change to a data page is first logged with a Log Sequence Number (LSN); only then can the page be written to disk. After a crash, recovery runs in three passes: analysis reconstructs the in-flight transactions and dirty pages from the log; redo replays every logged change since the last checkpoint to restore the pre-crash state; undo rolls back the changes of transactions that were in-flight at the crash. The algorithm handles fine-granularity locking, partial rollbacks via savepoints, and online recovery with concurrent transactions. Once you've understood ARIES you can read the source of any production database's recovery manager.

The problem

Before ARIES, every database implemented recovery differently and most got it subtly wrong. System R's shadow paging was elegant but expensive. INGRES used immediate updates with no-force, no-steal — fast normal operation, but recovery required scanning the whole database. CICS had its own recovery scheme. The result: every recovery algorithm worked under some workload and broke under others, and the literature didn't have a clean reference point.

IBM had been running DB2 (and predecessor IMS) in production at customer sites for years. The customers needed crash recovery to be: correct (no lost committed transactions, no surviving aborted ones), fast (proportional to recent activity, not the dataset size), online (concurrent transactions during recovery), and granular (page-level redo, record-level undo, partial rollback). No published algorithm did all of those.

The key idea

Three principles drive ARIES. First, write-ahead logging: a log record describing the change must reach stable storage before the modified page is flushed. This means every page on disk has a corresponding log record, identified by its LSN. Second, repeating history during redo: recovery replays every logged change since the last checkpoint, regardless of whether the transaction later committed or aborted. This makes the database state identical to what it was just before the crash. Third, logging during undo: undo records (Compensation Log Records, CLRs) are themselves logged, so a crash during undo doesn't lose progress.

Each page on disk carries the LSN of the last log record that updated it. During redo, the algorithm reads each log record; if the page's on-disk LSN is older than the log record, the change must be re-applied; otherwise it's already on disk and can be skipped. This page-LSN comparison is what makes recovery proportional to the size of the in-flight transaction log, not the database.

For undo, each log record carries a prevLSN pointing to the previous log record of the same transaction. Undo walks this chain backward, undoing each operation. A CLR is written for each undone operation, with its own undoNextLSN pointing to the next operation to undo. If a crash happens during undo, the next recovery picks up exactly where the previous one left off.

The fuzzy checkpoint trick. A naive checkpoint would stop the world, force every dirty page to disk, and write a marker. ARIES uses a fuzzy checkpoint: take a snapshot of the in-memory transaction table and the dirty-page table, log them, but don't force pages. The cost of a checkpoint is then bounded — one log write and a few page reads. Recovery starts from the most recent checkpoint and re-reads pages as needed. This is why production databases can checkpoint every few seconds without measurable impact.

Contributions

The three-pass recovery algorithm. Analysis to find dirty pages and in-flight transactions, redo to replay all changes, undo to roll back the survivors. This is the structure of every database recovery manager since.

Log Sequence Numbers. Every log record gets a monotonically increasing LSN; every page records the LSN of its last update. The LSN-on-page comparison is the foundation of the page-granularity skip-if-already-applied optimisation.

Compensation Log Records. Logging the undo operations themselves so that recovery is idempotent. Without CLRs, a crash during undo would re-undo already-undone operations on the next recovery.

Fine-grained locking. ARIES supports page-level redo with record-level undo, so locks can be record-level even though writes are page-granular. This was a major operational improvement — earlier algorithms forced page-granularity locking, which throttled hot-spot tables.

Operational completeness. The paper handles partial rollbacks (savepoints), nested transactions, multiple log destinations, group commit, and more. It reads as a working specification for engineers; almost every subsequent commercial database implements a variant.

Criticisms and limitations

The paper is dense and operational rather than theoretical. There's no formal correctness proof in the published version; correctness is argued through case analysis. Subsequent papers (notably Lomet's "Recovery for Shared-Disk Systems") provided more formal treatments.

ARIES assumes a single-node database with shared memory for the buffer pool and lock manager. Adapting it for distributed transactions, shared-disk clusters, or replicated systems requires significant additional machinery (two-phase commit logs, log shipping, parallel recovery). Modern distributed databases mostly use ARIES per shard plus Paxos/Raft for cross-shard ordering.

The algorithm doesn't prescribe a buffer-management policy or a concurrency-control scheme — those are deliberately orthogonal. Real systems combine ARIES with specific policies (LRU buffer, 2PL or MVCC for concurrency) that the paper doesn't cover.

Where it shows up today

IBM DB2 — the original implementation. Still ships with ARIES recovery.

PostgreSQL — uses WAL with LSNs and a recovery algorithm structurally identical to ARIES. The Postgres docs reference the paper explicitly.

MySQL InnoDB — uses redo + undo logs with LSNs. The MySQL recovery flow is recognisably ARIES.

SQL Server, Oracle, SAP HANA, SAP ASE — all implement ARIES variants.

Modern OLTP databases — CockroachDB, YugabyteDB, TiDB, MariaDB — combine ARIES-style WAL per shard with Paxos/Raft replication of the log itself.

Even durable embedded engines like SQLite use WAL (since version 3.7.0) with LSN-equivalent page-stamp comparisons during recovery.

Follow-up reading

The Log-Structured Merge-Tree — O'Neil et al · 1996 · Acta Informatica. A different storage organisation with the same WAL-first principle. Annotated.
A Critique of ANSI SQL Isolation Levels — Berenson et al · 1995 · SIGMOD. Concurrency control that pairs with ARIES recovery. Annotated.
Architecture of a Database System — Hellerstein, Stonebraker, Hamilton · 2007 · FnT Databases. The textbook chapter on database internals. Read this alongside ARIES for the operational context.
Aether: A Scalable Approach to Logging — Johnson et al · 2010 · VLDB. Scaling ARIES-style logging to many cores. Production-grade.
Aurora: Design Considerations for High-Throughput Cloud-Native Relational Databases — Verbitski et al · 2017 · SIGMOD. How AWS moved the WAL out of the database and into a distributed storage layer. ARIES, redistributed.

More annotated papers

Back to the papers index

Foundational distributed-systems and database papers, read and annotated.

← All papers

Found this useful?