Handbook · Vol. IV · 2026 Track I · The data layer · piece 1 of 4 Primer

Track I · The data layer

Database indexing.

B-tree vs hash vs LSM — where the extra disk goes, why queries fly, when to add a covering index, and when to delete one. The chapter to re-read before any schema review.

Track I · The data layer
How data lives, scales, and recovers.
  1. Primer
    Database indexing
  2. Deep dive
    Database scaling
  3. Primer
    NoSQL databases
  4. Decision rule
    When to shard

An index turns an O(n) scan into an O(log n) lookup. The wrong one slows every write and silently doubles your storage. The right one is invisible.

Even a perfectly sharded database is slow if every query has to scan every row. Indexes are the data structures that prevent that scan, at the cost of disk space, write amplification, and a planner that can be tricked into ignoring them. This page covers the index types, the trade-offs each imposes, and the queries you can predict will hit them. By the end: you can read an EXPLAIN plan, justify each index in a schema, and recognise the cases where adding one will make things worse.

B+ TREE · order 4 · 3 levels [ 25 · 50 · 75 ] [ 10 · 18 ] [ 30 · 40 ] [ 55 · 65 ] [ 80 · 90 ] 8→row12→row18→row 30→row42→row 55→row70→row 80→row95→row internal nodes route · leaves point to row tuples · siblings linked for range scans
A B+ tree. Internal nodes are pure routers; only leaves carry payload pointers. Leaf siblings are linked, so once you find the start of a range you can walk forward without re-traversing the tree.

How an index actually works

A database index is the same idea as the index at the back of a textbook: instead of reading every page to find "B-Tree," you look up the term in a sorted side-list, jump straight to page 142, and read just that page. Without an index a query for WHERE email = 'x@y.com' against a 10M-row table reads every row — a sequential scan. With a B-tree index the engine descends three or four levels, hits a leaf, and follows a single pointer. Roughly: 10,000,000 page reads becomes 4.

Every index has a cost. It typically uses 10-30% of the table's storage for itself. It must be updated on every INSERT, UPDATE, and DELETE — write throughput drops as you add indexes. And the planner now has to choose between them, which is sometimes worse than having only one. Indexes are not free; they are bought.

What you buy
Logarithmic point lookups, ordered range scans, fast sorts on indexed columns, fast joins on indexed keys, primary-key uniqueness enforcement.
What you pay
Storage (often 20% per index), write amplification (each write touches all indexes on the table), planner complexity (the optimiser sometimes picks the wrong one), maintenance windows (large index builds block writes on some engines).
Decision rule
Add an index when a query that runs frequently has a selective predicate on a column that does not already lead an existing index. Read EXPLAIN before and after — the plan must change to confirm the index is being used.

The index types — and where each one wins

"Add an index" usually means "add a B-tree index" because B-trees are the default in Postgres, MySQL, SQL Server and Oracle. But there are at least five distinct index structures in mainstream databases, and each maps to a different access pattern. Pick by the query, not by habit.

B-tree (B+ tree)

Default. Sorted, balanced, supports equality and range. Best for =, <, >, BETWEEN, IN, ORDER BY, prefix LIKE 'foo%'. Scales to billions of rows at 4-5 levels deep.

Hash

Constant-time equality only. No range, no sort, no ordering. Postgres has them but rarely worth it; the in-memory hash join is more important than the on-disk hash index.

Bitmap

One bit per row per distinct value. Brilliant for low-cardinality columns (status, country, gender) on read-mostly tables. Lethal for write-heavy tables — every update locks a chunk of the bitmap.

GIN / inverted

Maps each token to the rows that contain it. Used for full-text search, JSONB key lookup, array containment. Slower to build, faster to search than scanning JSON.

GiST / R-tree / spatial

Generalised tree for geometry, nearest-neighbour, and overlap queries. Underlies PostGIS, MySQL spatial, and "find the nearest store" queries.

BRIN

Block-range index — stores min/max per page range. Tiny (a few kB) and useful when the table is naturally ordered, e.g. time-series append-only. Useless on randomly-ordered data.

Composite indexes and the leftmost-prefix rule

A single-column index on email answers WHERE email = ?. A composite index on (country, city, last_name) answers more — but only certain shapes. The rule is leftmost-prefix: the planner can use the index for any query that uses a prefix of the columns in declared order.

QueryUses index on (country, city, last_name)?
WHERE country = 'US'Yes — leftmost column.
WHERE country = 'US' AND city = 'Austin'Yes — full prefix.
WHERE country = 'US' AND last_name = 'Lee'Partial — country uses index, last_name does not.
WHERE city = 'Austin'No — skips the leftmost column.
WHERE last_name = 'Lee'No — skips the first two columns.
ORDER BY country, cityYes — index is already in this order.

Order the columns by selectivity and predicate shape: equality columns first, then the column you range on, then the column you sort by. (user_id, created_at) is the canonical pattern for "give me this user's recent activity" — equality on user, range on time, leaf walk to the most recent rows.

Covering indexes — when the index is the table

A query that needs columns not in the index has to do a second lookup: index → row pointer → heap fetch → return columns. A covering index includes every column the query reads, so the engine never touches the heap. This is the standard latency optimisation for hot read paths.

-- Postgres: INCLUDE adds non-key columns to the leaf.
CREATE INDEX idx_orders_user_recent
  ON orders (user_id, created_at DESC)
  INCLUDE (status, total_cents);

-- This query is now satisfied entirely from the index:
SELECT status, total_cents
FROM orders
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;

Covering an index roughly halves the I/O for a hot query. The cost is leaf bloat — each leaf page holds fewer rows because each entry is bigger, so the tree gets taller. Worth it for the top three or four queries; not worth it for everything.

Partial and functional indexes

If most of your rows are uninteresting, index only the interesting ones. A partial index covers a slice defined by a WHERE clause; it's smaller, faster to update, and only used when the query's predicate matches.

-- Only 3% of orders are pending; the index covers only those.
CREATE INDEX idx_orders_pending
  ON orders (created_at)
  WHERE status = 'pending';

-- Functional: index lower(email) so case-insensitive lookups hit it.
CREATE INDEX idx_users_email_lower
  ON users (lower(email));

-- Then the query must match the function shape:
SELECT * FROM users WHERE lower(email) = lower('Alice@example.com');

Functional indexes are how case-insensitive search, computed columns, and JSON path lookups become fast. The expression in the query must match the indexed expression exactly — lower(email) is indexed, plain email is not.

Reading EXPLAIN — the only way to verify

The planner picks an access path based on table statistics. Adding an index does not guarantee the planner will use it; EXPLAIN ANALYZE is the only way to know. Look for these access paths and roughly what they mean:

Seq Scan
Reads every row. Acceptable on small tables (a few thousand rows) or when the predicate matches more than ~20% of rows. A red flag on a 10M-row table with a selective predicate.
Index Scan
Walks the B-tree, fetches matching rows from the heap. The good case for selective queries.
Index Only Scan
The index covers all required columns; no heap fetches at all. The fastest path. Aim for this on hot reads.
Bitmap Heap Scan
Builds a bitmap of matching pages, then visits them in physical order. Wins on medium-selectivity queries that match many rows scattered across the heap.
Nested Loop / Hash / Merge Join
The three join algorithms. Nested loop is fast for small driving sides with an indexed inner; hash is the workhorse; merge requires both inputs sorted.
Look at rows × loops, not just total time. An EXPLAIN line saying actual rows=10 loops=10000 means 100,000 rows of work. A plan with low total time but high loop count is one parameter change away from disaster. Always read both columns.

The hard cases

The cases below are the ones that send queries from "OK" to "page during peak."

Index bloat after heavy updates. B-trees do not shrink when rows are deleted; they leave dead tuples in leaves. After a few months of churn, a 1 GB index can be carrying 600 MB of dead space. Mitigation: REINDEX CONCURRENTLY on Postgres 12+, scheduled monthly on hot tables.
The planner ignores your new index. Almost always because table statistics are stale. Run ANALYZE after a bulk load. If it still does not pick the index, check selectivity: if 30% of rows match, a sequential scan really is cheaper.
Implicit casts kill indexes. If user_id is BIGINT and the query is WHERE user_id = '42', some drivers cast every row to text instead of casting the literal once. The index becomes invisible. Match types exactly.

Practical defaults to start with

  1. Index every foreign key. Joins use them; deletes need them to avoid full scans on the child table.
  2. Index (tenant_id, ...) as the leftmost column on multi-tenant tables. Almost every query has it.
  3. For "user's recent X" patterns, use (user_id, created_at DESC) with INCLUDE for hot read columns.
  4. Use partial indexes for "only X% of rows are interesting" — pending jobs, unread messages, active subscriptions.
  5. Don't index columns with two or three distinct values unless the table is read-mostly and a bitmap index is available.
  6. Set up pg_stat_statements (or equivalent) and review the top 10 queries by total time monthly. That list is your indexing backlog.
Found this useful?