WAL Recovery Simulator

Write to the database. Kill the process. Restart. Watch the write-ahead log replay restore everything that committed — and silently discard everything that didn't. This is the mechanism that every modern database uses to survive crashes.

wal

durable

state

live

What you're looking at

Two panels. The page cache is the database's in-memory state — fast, and gone the moment the process dies. The WAL is the append-only log on disk, where every change is recorded before it counts as real; entries marked fsynced have actually reached the platter. The buttons drive one transaction at a time: BEGIN, a couple of WRITEs, then COMMIT or ABORT. CHECKPOINT flushes memory to the data files so the log can be trimmed.

Do one clean run first: begin, write alice and bob, commit, then hit Crash and Restart. The replay walks the log and rebuilds exactly what committed. Then crash halfway through a transaction, before COMMIT — on restart those writes vanish, on purpose. What should surprise you is what the database never does: it never writes your row to its data files before acknowledging your commit. Durability is one sequential append plus fsync, and the checkpoint exists to keep recovery short, not to keep data safe. This is the same machinery under Postgres, MySQL, and SQLite.

Database started. Empty WAL, empty page cache.

Page cache (memory)

— empty —

WAL (disk if fsynced)

— empty —

Try this

BEGIN → WRITE alice → COMMIT. Now click Crash. Click Restart. Alice's value survives — the COMMIT made it durable.
BEGIN → WRITE bob → don't commit. Click Crash. Click Restart. Bob's write is discarded — uncommitted work doesn't survive.
BEGIN → WRITE → COMMIT → CHECKPOINT → BEGIN → WRITE → COMMIT → Crash → Restart. Notice the recovery log only walks from after the checkpoint.

The three guarantees

Atomicity. Crash mid-transaction; restart rolls back. The transaction is all-or-nothing.

Durability. Once COMMIT returns, the writes survive a crash. fsync() is the mechanism.

Bounded recovery. Without checkpoints, recovery walks every WAL entry ever written. With them, it skips to the last checkpoint and replays only what came after.

Adjacent

Write to the log first. Always.

The single sentence that makes everything else work.

A write-ahead log enforces one rule: every change to the database is first appended to a log, and that log is flushed to disk, before the change is reflected in the actual data pages. The reverse — page first, log second — is the bug that destroys data on crashes. If the database wrote a page and crashed before logging the change, the change is in storage but the system doesn't know why. If it logs first and crashes before writing the page, the log knows what to redo.

The WAL turns crash recovery from a guessing game into a replay. On restart, walk the log from the last checkpoint forward; for each committed transaction, re-apply its writes to the data pages; for each uncommitted one, do nothing (or roll back any partial state). The result is a database identical to what would have existed at the moment of the last COMMIT before the crash.

Every database you've heard of uses one

Different names, same idea.

System	Log name	Notes
PostgreSQL	WAL	`pg_wal/` directory. 16 MB segments. Checkpoint every 5 min or 1 GB of WAL by default.
MySQL (InnoDB)	redo log	`ib_logfile0`, `ib_logfile1`. Plus a separate undo log for MVCC rollback.
SQLite	WAL (since 3.7)	`-wal` file alongside the main db. Replaces older rollback-journal mode.
RocksDB / LevelDB	WAL + MANIFEST	WAL for in-progress writes; MANIFEST tracks SSTable lifecycle.
etcd	WAL	Raft uses the same WAL primitive — log entries become committed when a majority of nodes have fsynced them.
Kafka	segment log	Kafka IS a log. The entire data model is "append to a per-partition file".
Filesystems (ext4, XFS, NTFS)	journal	Same idea applied to filesystem metadata. `jbd2` on ext4 is a generic journal layer.

The ARIES paper (Mohan et al, 1992) is the canonical treatment. Read it once; the redo + undo two-pass recovery model it describes is what every database above implements with minor variations. Annotated copy on the site at /papers/aries/.

Found this useful?