B-trees, LSM, MVCC —
one structure at a time.
B-trees, LSM, WAL, MVCC, isolation levels, query planning, replication, sharding, distributed SQL, the buffer pool. The structures that every storage engine assembles differently. Each one explained at the level of "what's the data structure", "what's the algorithm", and "what does this mean for the SQL you write".
All fourteen sub-pages are live. Each is a long-form walk-through with engine source links, papers, and references to the canon (Hellerstein, Stonebraker, Petrov, the Postgres source).
Start here.
B-trees
The on-disk index every relational database has shipped since 1972. Page layout, splits, fan-out, latching, copy-on-write variants. Why Postgres, MySQL, and SQLite all picked the same shape.
LSM-trees
RocksDB, LevelDB, Cassandra, ScyllaDB — all the same shape. Memtable, immutable SSTables, leveled compaction, tombstones, bloom filters. Why writes are sequential and reads are amplified.
Write-ahead log
The first thing every durable database does. fsync, group commit, log shipping, ARIES recovery. Why the WAL is the truth and the data files are the cache.
MVCC
Multi-version concurrency control — how Postgres, Oracle, and MySQL/InnoDB give every transaction its own snapshot of the database without locking readers behind writers.
Isolation levels
Read uncommitted to serializable, plus the anomalies they each forbid. Dirty reads, non-repeatable reads, phantoms, write skew. What your database actually gives you when you say SERIALIZABLE.
Query planner
Parser, rewriter, planner, executor. Cost estimation, statistics, join orders, hash vs merge vs nested loop, EXPLAIN ANALYZE. Why your query is slow and what the planner thought it was doing.
Replication
Statement vs row vs logical, sync vs async, leader-follower vs multi-leader. The Postgres streaming replication, MySQL binlog, MongoDB oplog, Aurora's "log is the database".
Sharding
Range, hash, geo, directory. Vitess, Citus, MongoDB sharding. The hot-shard problem, the resharding problem, the cross-shard transaction problem.
Distributed SQL
Spanner, CockroachDB, YugabyteDB, TiDB. How you keep ACID semantics when the data is on a hundred machines — TrueTime, hybrid logical clocks, Raft groups per range.
Buffer pool & page cache
Where rows actually live in RAM. The OS page cache vs the database's own buffer pool, dirty page write-back, the double-write buffer, why O_DIRECT changes the math.
Vacuum & compaction
What MVCC and LSM systems do with the garbage they accumulate. Postgres autovacuum, transaction-ID wraparound, RocksDB leveled vs universal compaction, write-amplification trade-offs.
Time-series databases
Append-only, monotonically timestamped, mostly numeric. Time-partitioned chunks, Gorilla compression, downsampling, retention. How Prometheus, InfluxDB, TimescaleDB, Druid, and ClickHouse compare.
Choosing a database
The decision tree. OLTP, OLAP, KV, document, time-series, vector, graph, search — each access pattern with the named systems and the rewrite cost when the choice is wrong. Postgres is the right default until you can name the constraint that forces you off it.
Index types
Every index type a working database ships — B-tree, hash, GIN, GiST, SP-GiST, BRIN, inverted, vector, bitmap. What each stores, what queries it answers, and when it makes the table slower than it was before.