Day-0 → Month-3 · curriculum
Study path · Databases

Databases,
learned properly.

The relational model is fifty years old and still wins. Every database solves two problems: durability and concurrency. Everything else, indexes, planners, replication, sharding, is a careful trade in service of those two. One pass through the mental models, one pass through the paper canon, one pass through the labs.


Why databases exist.

Two problems, repeated forever. Durability: when the power fails, the bytes you said you saved are still there. Concurrency: when ten thousand clients hit the same data at once, none of them sees a contradiction. Files can't do this on their own, and neither can the operating system. The database is the layer that buys both, with a surprisingly small set of primitives: pages, a write-ahead log, locks (or MVCC), and an index.

The relational model, Codd 1970, added a third gift: a logical schema independent of the physical layout. SQL queries describe what you want; the planner decides how to fetch it. Fifty years of competing models, hierarchical, network, object, document, key-value, graph, have all been absorbed back into relational engines (JSONB, GIN indexes, foreign tables, materialised views) without dethroning the core. Postgres in 2026 is the most capable database humans have ever built, and it ships under the same license it had in 1996.

When not to reach for "something webscale". Modern hardware: 1 TB RAM, 24 TB NVMe, 128 cores. A single Postgres instance handles tens of thousands of writes per second and terabytes of working set. The operational tax of distributed SQL (Spanner, CockroachDB, Vitess) is real. Use one node until it actually hurts, then add a read replica, then partition.

The twelve mental models you must build.

Twelve concepts cover ~95% of database surface. Get these in your bones in the first month. Every engine you meet (Postgres, MySQL, Cassandra, RocksDB, Spanner, ClickHouse) is a recombination of them.

01 B-tree Day-zero

Sorted, balanced, height ≈ log_B(N). The default index for read-heavy workloads — Postgres, MySQL/InnoDB, SQL Server. Range scans are cheap; random writes are not.

02 LSM-tree Practitioner

Log-structured merge-tree. Buffer writes in memory, flush sorted runs to disk, merge them in the background. Powers RocksDB, Cassandra, ScyllaDB, LevelDB — write-optimised by design.

03 Page / block Day-zero

The unit of disk I/O — typically 8 KB or 16 KB. Buffer pools, WAL records, indexes, heaps — every storage abstraction sits on top of pages.

04 Write-ahead log Practitioner

Append-only durability primitive. Write the log first, fsync, then update pages. Crash recovery replays the log. Every database, every replication protocol has one underneath.

05 MVCC Practitioner

Multi-version concurrency control. Each row has multiple visible versions tagged by transaction id. Readers don't block writers; writers don't block readers. The reason Postgres has VACUUM.

06 Isolation levels Practitioner

Read Uncommitted → Read Committed → Repeatable Read → Snapshot → Serializable. Each prevents one more concurrency anomaly. The default in most engines is Read Committed.

07 Locking Operator

Shared (S) and exclusive (X) locks at row, page, and table grain. Deadlock detection via wait-for graphs. Lock escalation when fine-grained locking exhausts memory.

08 Indexes Day-zero

B-tree (the default), hash (equality only), GIN (full-text, JSONB), GiST (geometric, range), bitmap (low-cardinality, analytics), partial (WHERE clause), expression (function output).

09 Query planner Practitioner

Cost-based optimiser. Reads statistics from pg_stats / information_schema, enumerates join orders, picks the lowest estimated cost. EXPLAIN ANALYZE shows what it actually did.

10 Replication Operator

Async (lossy on failover), sync (slow), semi-sync (one replica acks). Leader-follower (Postgres, MySQL) versus multi-master (Galera, BDR). Trade durability for latency.

11 Sharding & partitioning Operator

Split the dataset by hash, range, or geography. Range scans favour range partitioning; uniform load favours hash. Cross-shard joins and global indexes are the recurring pain.

12 Distributed consensus in DBs Engineer

Spanner uses Paxos under TrueTime. CockroachDB and TiDB use Raft per range. The reason a "distributed SQL" database can claim serialisable transactions across continents.

Day-zero — your first hour.

One hour. Open Designing Data-Intensive Applications and read chapters 3 through 7. Then run pgbench and read the EXPLAIN ANALYZE output of one of your own queries. The bar is muscle: you have read about storage and you have measured a real database under load.

# 1. Read DDIA chapters 3–7 (≈ 90 minutes)
#    — Storage and retrieval (B-trees, LSM-trees)
#    — Encoding and evolution
#    — Replication
#    — Partitioning
#    — Transactions

# 2. Spin up a local Postgres
docker run --name pg -e POSTGRES_PASSWORD=pw -p 5432:5432 -d postgres:16
docker exec -it pg psql -U postgres

# 3. Run pgbench — synthetic TPC-B-like workload
docker exec -it pg pgbench -i -s 10 postgres
docker exec -it pg pgbench -c 8 -T 30 postgres

# 4. EXPLAIN ANALYZE one query you actually run
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;
#    — Read every line. Note Seq Scan vs Index Scan vs Bitmap Heap Scan.
#    — Note "actual time" vs "rows" — large mis-estimates mean stale stats.

# 5. (Optional, satisfying) read pg_stat_statements for the top-10 queries
CREATE EXTENSION pg_stat_statements;
SELECT query, calls, mean_exec_time
  FROM pg_stat_statements
  ORDER BY mean_exec_time DESC LIMIT 10;

Done. You have read the right five chapters, run a real benchmark, and looked at a real query plan. Everything below extends from this beachhead.

Week-1 to Month-3 — pick a track.

After the first hour you can read database writing without bouncing off it. Spend the next three months one track at a time, depth-first. Don't try to learn storage internals and distributed SQL in the same fortnight. Pick the track that maps to your job and finish it.

Storage internals

Pages, heaps, B-trees, LSM-trees, Bloom filters. Petrov's Database Internals is the right book. Open the InnoDB or Postgres source and follow a single SELECT through the buffer pool.

→ Reference
WAL & recovery

Write-ahead logging, checkpoints, ARIES (analysis / redo / undo). Read the ARIES paper twice. Crash a Postgres in a VM, watch it recover; read pg_waldump output line by line.

→ Reference
Concurrency control

MVCC, snapshot isolation, serializable snapshot isolation, two-phase locking. Read the Berenson / Adya papers. Reproduce the write-skew anomaly on snapshot isolation in your own database.

→ Reference
Query optimisation

System R's join enumeration. Cost models. Statistics. Plan stability and plan caching. EXPLAIN ANALYZE on real workloads. Reading Postgres' planner source is the deepest single jump you can make.

→ Reference
Replication & HA

Async, sync, semi-sync. Logical decoding. Failover, fencing, split-brain. Read pg_basebackup, set up streaming replication, kill the leader, observe the lag — then read the replication slot internals.

→ Reference
Distributed SQL

CockroachDB, TiDB, YugabyteDB, Spanner. Range-based sharding with per-range Raft. Distributed transactions via 2PC layered on consensus. Read the Spanner and CockroachDB papers in that order.

→ Reference
Analytics & columnar

C-Store / Vertica, ClickHouse, DuckDB, Snowflake. Columnar layouts, vectorised execution, compression. The OLAP world is a different planet — pick one engine and read it deeply.

→ Reference

The books that matter.

2017 · O'Reilly
Martin Kleppmann — Designing Data-Intensive Applications

DDIA. The narrative thread connecting storage, indexing, replication, partitioning, transactions, isolation. Chapters 3 and 7 are the ones that matter most for understanding internals.

2019 · O'Reilly
Alex Petrov — Database Internals

The best modern internals walkthrough. B-trees, LSM-trees, recovery, distributed-DB algorithms — the storage layer DDIA gestures at; Petrov works through it line by line.

2018 · MIT Press · free online
Stonebraker / Hellerstein — Readings in Database Systems (the "Red Book")

A curated, opinionated tour of the literature. Each chapter is a paper list with editorial commentary by Hellerstein and Stonebraker. Free online — the canonical syllabus.

2019 · McGraw-Hill
Silberschatz, Korth, Sudarshan — Database System Concepts

The foundational textbook. Drier than DDIA, broader than Petrov — relational algebra through to query optimisation, recovery, and distributed systems. The reference, not the page-turner.

1992 · Morgan Kaufmann
Gray & Reuter — Transaction Processing: Concepts and Techniques

The bible for transactions, recovery, and locking. Decades old, still not surpassed. If you implement an engine, you re-read this every year.

2012 · O'Reilly
Schwartz, Zaitsev, Tkachenko — High Performance MySQL (4th ed.)

The operator's book for MySQL. Storage engines, replication, query optimisation, observability. Even if you run Postgres, the operational instincts transfer.

2017 · self-published
Dimitri Fontaine — The Art of PostgreSQL

SQL as a programming language. Window functions, lateral joins, recursive CTEs, JSONB. The book that turns "I write Postgres queries" into "I think in Postgres".

Honourable mentions: PostgreSQL Internals (Egor Rogov, free online); SQL Performance Explained (Markus Winand — the print edition of Use-the-Index-Luke); Streaming Systems (Akidau et al — the data-pipeline complement to DDIA).

Courses, lectures, and where to actually learn.

Free
Paid (worth it)

The paper canon.

Fourteen papers, roughly 1970 → 2017. Read them in order. The later ones cite the earlier ones constantly. Most are 8–20 pages.

  1. 01
    1970 · Codd
    A Relational Model of Data for Large Shared Data Banks

    The founding paper. Relations, tuples, normal forms; the case for separating logical model from physical layout. Twelve pages that started the field.

  2. 02
    1976 · Astrahan et al
    System R: A Relational Approach to Database Management

    IBM's prototype — the first relational DBMS. SQL, the cost-based optimiser, the storage manager. Almost every modern relational engine traces here.

  3. 03
    1976 · Eswaran, Gray, Lorie, Traiger
    The Notions of Consistency and Predicate Locks in a Database System

    The original definition of degrees of consistency and predicate locking. Read it before reading anything else about isolation.

  4. 04
    1976 · Gray et al
    Granularity of Locks and Degrees of Consistency in a Shared Database

    Multi-granularity locking and the four classic isolation degrees. The vocabulary every modern transaction manager still uses.

  5. 05
    1992 · Mohan et al
    ARIES: A Transaction Recovery Method

    How WAL recovery actually works. Analysis, redo, undo. The thing every database does on crash recovery; the algorithm by which the WAL keeps its promise.

  6. 06
    1995 · Berenson et al
    A Critique of ANSI SQL Isolation Levels

    The famous critique. ANSI's phenomena are ambiguous; snapshot isolation is missing; serialisable is the only honest level. Read this and you stop trusting "Repeatable Read".

  7. 07
    1996 · O'Neil et al
    The Log-Structured Merge-Tree (LSM-Tree)

    The LSM paper. Buffer in memory, flush sorted runs, merge in the background. The architectural ancestor of LevelDB, RocksDB, Cassandra, ScyllaDB.

  8. 08
    2000 · Adya
    Generalized Isolation Level Definitions (PhD thesis chapter)

    Adya's redefinition of isolation in terms of forbidden histories — implementation-independent, mathematically clean. The right vocabulary for arguing about isolation.

  9. 09
    2005 · Stonebraker et al
    C-Store: A Column-oriented DBMS

    The columnar paper. Read columns, not rows; compress aggressively; vectorise execution. The architectural seed of Vertica, MonetDB, ClickHouse, every modern OLAP engine.

  10. 10
    2006 · Chang et al
    Bigtable: A Distributed Storage System for Structured Data

    Wide-column store on GFS + Chubby. Tablets, SSTables, the LSM-style write path at planet scale. The grandparent of HBase, Cassandra, DynamoDB.

  11. 11
    2007 · DeCandia et al
    Dynamo: Amazon's Highly Available Key-value Store

    Consistent hashing, vector clocks, sloppy quorums, hinted handoff. The paper every NoSQL system reaches back to; Cassandra, Riak, and DynamoDB are its descendants.

  12. 12
    2012 · Thomson & Abadi
    Calvin: Fast Distributed Transactions for Partitioned Database Systems

    Deterministic transaction ordering as a primitive. Decide the order first, then run; the contrarian alternative to 2PC + consensus. FaunaDB descends directly from this.

  13. 13
    2012 · Corbett et al
    Spanner: Google's Globally-Distributed Database

    Externally consistent transactions over a planet-scale database via TrueTime — bounded clock uncertainty backed by GPS + atomic clocks. The paper distributed-SQL is judged against.

  14. 14
    2017 · Verbitski et al
    Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases

    Push the redo log into the storage layer; let storage do the page reconstruction. The cloud-native rewrite of MySQL/Postgres — six-way replicated, quorum-write logs.

Going further: FoundationDB: A Distributed Unbundled Transactional Key Value Store (Zhou et al, 2021); The Design and Implementation of MonetDB; the Snowflake SIGMOD 2016 paper (cloud OLAP, separation of storage and compute); the ClickHouse design paper. Pair every paper with at least one open-source engine that implements it.

Talks worth your evening.

Hands-on environments — where to actually run things.

Theory without something you can run is fragile. Each of these is a manageable way to make a database push back when you make a mistake.

EnvironmentCostBest for
CMU BusTubFreeAndy Pavlo's teaching DB. Implement the buffer pool, B+tree index, query executor, and concurrency control in C++. Graded by a hidden test suite — the closest thing to a real engine you can build solo.
pgbench / sysbenchFreeSynthetic OLTP benchmarks. Pgbench ships with Postgres; sysbench supports MySQL, Postgres, and arbitrary Lua workloads. Use to characterise your hardware before you tune anything.
pg_stat_statements + EXPLAINFreeOperate on a real Postgres. Enable pg_stat_statements; rank queries by total time; EXPLAIN ANALYZE the top ten. The single most productive operational habit you can form.
HammerDBFree, open-sourceTPC-C and TPC-H workloads against Postgres, MySQL, Oracle, SQL Server. Heavier than pgbench; approximates a realistic OLTP load. Useful for comparing engines or hardware honestly.
JepsenFree, open-sourceAphyr's framework. Spin up a real cluster (Cassandra, Cockroach, Mongo, etc.), hammer it under partition / latency / message loss, check for consistency violations. Steeper than the others; closer to what production failure looks like.

Isolation levels — cheat sheet.

Five levels, five anomalies. Print this and tape it next to the monitor. The famous footnote is that Snapshot Isolation prevents phantoms but allows write skew. That's why "real" Serializable matters for some workloads, and why Postgres' Serializable Snapshot Isolation (SSI) is the pragmatic compromise: snapshot semantics, with dependency tracking to detect (and abort) the dangerous histories.

Level Dirty read Non-repeatable read Phantom Write skew Lost update
Read Uncommitted Possible Possible Possible Possible Possible
Read Committed Prevented Possible Possible Possible Possible
Repeatable Read Prevented Prevented Possible (spec) Possible Prevented (some)
Snapshot Isolation Prevented Prevented Prevented Possible Prevented
Serializable Prevented Prevented Prevented Prevented Prevented

Defaults: Postgres → Read Committed; MySQL/InnoDB → Repeatable Read (with phantom-prevention via gap locks); Oracle → Read Committed; SQL Server → Read Committed; Spanner / CockroachDB → Serializable. Most apps run at Read Committed and assume Serializable.

Common mistakes that ship to production.

Patterns every team writes at least once. Read these now so you recognise the shape later, when something on-call is misbehaving and the dashboard is unhelpful.

No indexes on join columns / FKs
Every join over a non-indexed column is a sequential scan. The DB will not warn you. EXPLAIN ANALYZE will show "Hash Join" with a child Seq Scan; that is your sign.
N+1 queries
Load 100 users, then issue 100 separate queries for their profiles. ORMs make this invisible by default. Eager-load joins or batch the second query — log per-request query counts and alert on the tail.
VARCHAR(MAX) / TEXT when you wanted a constraint
Storing email as TEXT means the DB will accept a paragraph. A length cap, a CHECK constraint, or a domain type enforces the contract at the layer that survives bugs in every other layer.
Wrong isolation level by accident
Postgres, MySQL, Oracle all default to Read Committed. Devs often write code that assumes serialisable. Half the "weird race condition in production" tickets are anomalies the dev didn't know could happen at RC.
Long transactions blocking VACUUM
Postgres can't reclaim row versions older than the oldest open transaction. A connection leaked open for a day → table bloats indefinitely → query plans degrade. Monitor pg_stat_activity for long idle-in-transaction sessions.
Connection pool exhaustion
App pool size × replicas easily exceeds DB max_connections. The DB doesn't scale with connections — each one is RAM and a backend process. Use PgBouncer; size the pool to (cores × 2) plus a little headroom.
Leading-wildcard LIKE queries
"WHERE name LIKE %smith%" cannot use a B-tree index; it is a sequential scan. Reach for a trigram index (pg_trgm) or full-text search; redesign the query before scaling the hardware.
Storing JSON when you should store columns
JSONB is for shape that varies row to row. If every row has the same keys, columns are faster, smaller, and check-constrainable. The inverse mistake — modelling a tag-list as 50 nullable columns — is also catastrophic.
ORM lazy-loading triggering unexpected queries
Iterating over a relation collection issues a query per element. The fix is per-ORM (eager-loading, prefetch, includes), but the symptom is universal: a request that should be one query becomes a thousand.
No WAL archive / PITR on production
Streaming replication is not a backup — it replicates DROP TABLE just as fast. Configure WAL archiving + base backups; rehearse point-in-time recovery on a real schedule. The first time you restore should not be the day you need to.

Practice deck.

Ten cards: the questions interviewers ask, the things that bite operators in production, and the trivia that separates "I use a database" from "I understand one".

Card 1 of 10
What does a B-tree give you that a hash index does not?
Suggested sequences

Reading progressions

Three ordered paths through this material — pick the one that matches where you are.

Path 01 · Storage
Storage internals

The structures that make databases fast: B-trees, WAL, and LSM trees from first principles.

  1. Database Indexing — B-trees & hash indexes
  2. B-Tree Simulator ↗
  3. Storage Engine Simulator ↗
  4. Write-Ahead Logging
  5. Bloom Filter Simulator ↗
Path 02 · Transactions
Transactions & isolation

ACID, the isolation levels, and why the same code behaves differently under different databases.

  1. ACID Transactions — isolation levels
  2. Isolation Levels Simulator ↗
  3. CAP Theorem Simulator ↗
  4. Read/Write Quorum Simulator ↗
Path 03 · Distribution
Sharding & replication

How databases scale horizontally: sharding strategies, replication lag, and the CAP trade-offs.

  1. Database Sharding Simulator ↗
  2. Consistent Hashing Simulator ↗
  3. Distributed IDs — shard-key design
  4. Distributed Systems Study Path

Keep going.

Databases reward reading and re-reading. The same ARIES paper, read on day 30 and again on day 300, gives you different things. So will DDIA. So will every Pavlo lecture. The field is not large. It is dense, and it has been compounding for fifty years.

Pick one real engine and read its source for an afternoon. Postgres, SQLite, RocksDB, DuckDB, CockroachDB are all open. Pair what you read with the paper that inspired it. Then come back to your own queries, your own schemas, your own indexes. You will rewrite some of them.