WAL Recovery Simulator

Write to the database. Kill the process. Restart. Watch the write-ahead log replay restore everything that committed — and silently discard everything that didn't. This is the mechanism that every modern database uses to survive crashes.

wal
0
durable
0
state
live

What you're looking at

Two panels. The page cache is the database's in-memory state — fast, and gone the moment the process dies. The WAL is the append-only log on disk, where every change is recorded before it counts as real; entries marked fsynced have actually reached the platter. The buttons drive one transaction at a time: BEGIN, a couple of WRITEs, then COMMIT or ABORT. CHECKPOINT flushes memory to the data files so the log can be trimmed.

Do one clean run first: begin, write alice and bob, commit, then hit Crash and Restart. The replay walks the log and rebuilds exactly what committed. Then crash halfway through a transaction, before COMMIT — on restart those writes vanish, on purpose. What should surprise you is what the database never does: it never writes your row to its data files before acknowledging your commit. Durability is one sequential append plus fsync, and the checkpoint exists to keep recovery short, not to keep data safe. This is the same machinery under Postgres, MySQL, and SQLite.

Database started. Empty WAL, empty page cache.
Page cache (memory)
— empty —
WAL (disk if fsynced)
— empty —

Write to the log first. Always.

The single sentence that makes everything else work.

A write-ahead log enforces one rule: every change to the database is first appended to a log, and that log is flushed to disk, before the change is reflected in the actual data pages. The reverse — page first, log second — is the bug that destroys data on crashes. If the database wrote a page and crashed before logging the change, the change is in storage but the system doesn't know why. If it logs first and crashes before writing the page, the log knows what to redo.

The WAL turns crash recovery from a guessing game into a replay. On restart, walk the log from the last checkpoint forward; for each committed transaction, re-apply its writes to the data pages; for each uncommitted one, do nothing (or roll back any partial state). The result is a database identical to what would have existed at the moment of the last COMMIT before the crash.


Every database you've heard of uses one

Different names, same idea.

SystemLog nameNotes
PostgreSQLWALpg_wal/ directory. 16 MB segments. Checkpoint every 5 min or 1 GB of WAL by default.
MySQL (InnoDB)redo logib_logfile0, ib_logfile1. Plus a separate undo log for MVCC rollback.
SQLiteWAL (since 3.7)-wal file alongside the main db. Replaces older rollback-journal mode.
RocksDB / LevelDBWAL + MANIFESTWAL for in-progress writes; MANIFEST tracks SSTable lifecycle.
etcdWALRaft uses the same WAL primitive — log entries become committed when a majority of nodes have fsynced them.
Kafkasegment logKafka IS a log. The entire data model is "append to a per-partition file".
Filesystems (ext4, XFS, NTFS)journalSame idea applied to filesystem metadata. jbd2 on ext4 is a generic journal layer.
The ARIES paper (Mohan et al, 1992) is the canonical treatment. Read it once; the redo + undo two-pass recovery model it describes is what every database above implements with minor variations. Annotated copy on the site at /papers/aries/.
Found this useful?