13 min read · Guide · Storage
How it works · Storage

How a database index finds a row without scanning.

A database without indexes scans every row of every table. With them, it lands on the right row in O(log n) hops. The trick is one tree.

Parts01 – 08 InteractiveLive B-tree PrereqSQL / pages

Why a table without an index is slow

Full scan, page by page.

A database index is an auxiliary data structure that lets the database find rows without scanning the whole table. The dominant index type is the B-tree (or B+ tree); equality-only indexes can use hashes. Rudolf Bayer and Edward McCreight introduced B-trees in 1972; they remain the primary index in PostgreSQL, MySQL, SQLite, SQL Server, Oracle.

Tables are stored as a sequence of fixed-size pages (typically 8 KB or 16 KB) on disk, kept warm by the database's buffer-pool cache. Without an index, finding any row means reading every page in turn — sequential, which is fast per byte, but every byte of every row, which is slow per query. A 100-million-row table is hundreds of megabytes of sequential read per lookup.

The cost grows linearly with table size. Find one user out of ten million? You're reading the entire users table. The query planner shows this as Seq Scan in EXPLAIN — the operation that should always raise the question "is there an index that could replace this?"


B-tree indexes, and why they are the default

B-tree — balanced, page-shaped.

A B-tree is a search tree in which each node holds many keys (not just one or two like a binary tree) and tree height stays low even for very large datasets. Each node fits in one disk page. A few key comparisons inside a node, one disk read to descend a level. A trillion-row B-tree is roughly 6 levels deep — meaning six page reads to find anything.

Inserts walk the tree to a leaf, add the key in sorted order, and split the leaf if it overflows — the middle key is promoted to the parent, the rest stays in two new leaves. Splits cascade upward when parents overflow too, ultimately growing the tree by one level. The simulator below shows this in real time.


Watch a B-tree split as you insert

A live B-tree.

Below: a B-tree of order 4 (max 4 keys per node). Type a number and add or look up. Inserts split nodes when they overflow; lookups highlight the path the engine takes.

1 node · depth 1 · 0 keys

Hash indexes: instant, but only for exact matches

Why they can’t help with ranges or sorting.

A hash index applies a hash function to the key, lands in a bucket, and finds the row pointer in (effectively) constant time. Faster than a B-tree on equality lookups, especially at very large scale. Postgres supports them; MySQL's memory engine does too.

The trade is sharp: hash indexes can't answer range queries (WHERE age > 30), can't scan in order, can't support ORDER BY. Their bucket layout is also random-access on disk, making them hostile to sequential reads. Use them only when every query is exact-equality and the table is very large.

B-tree

The default. Always.

Equality, range, ordered scans, prefix matches, ORDER BY. The right answer for almost every index you'll ever create.

Hash

Equality only. Rare.

Single-key lookups by hash code. Faster on huge tables, but useless for ranges. Reach for it when profiling justifies the trade.


Covering indexes: answering a query from the index alone

When the index already has every column the query needs.

A normal index lookup finds a row pointer, then visits the table page (the "heap") to read the rest of the columns. A covering index includes every column the query needs in the index itself — the engine can answer the query without touching the heap at all.

In Postgres: CREATE INDEX ... INCLUDE (col_a, col_b). The included columns are stored in the leaf nodes; not part of the search path, but available without a separate fetch. EXPLAIN shows Index Only Scan when this works. Often a 2–10× win on narrow read-heavy queries.


Multi-column indexes and the leftmost-prefix rule

Composite indexes, leftmost prefix.

An index on (country, city, age) can be used for queries filtering by country, by country + city, or by all three — but not by city alone, and not by age alone. This is the leftmost-prefix rule. The index is sorted in the order of its columns; you can chop off the right side of that order, but not the left.

This forces a real design decision: column order matters. Put the most-selective column first when most queries filter on a single column. Put the column that's used in every query first when queries vary. Get it wrong and you have an index that costs storage but never gets used.

-- Index
CREATE INDEX users_loc_age ON users (country, city, age);

-- Used by:
SELECT * FROM users WHERE country = 'NL';
SELECT * FROM users WHERE country = 'NL' AND city = 'Amsterdam';
SELECT * FROM users WHERE country = 'NL' AND city = 'Amsterdam' AND age = 30;

-- NOT used by (no leftmost):
SELECT * FROM users WHERE city = 'Amsterdam';
SELECT * FROM users WHERE age = 30;

When not to add an index

When NOT to index.

Every index is also a write tax. Every INSERT / UPDATE / DELETE on the table must update every index too — and each of those updates also goes through the write-ahead log. Five indexes on a write-heavy table mean every write does six writes. The right number of indexes minimises read cost given the write-cost budget.

low cardinality

Don't index boolean columns

An index on is_active picks half the rows; the planner will scan instead. Reach for partial indexes (WHERE is_active = true) or skip altogether.

tiny tables

Let small tables scan

Tables that fit in a few pages are faster to read sequentially. The B-tree descent would visit similar bytes anyway. Don't index a 200-row lookup table.

write-heavy

Cut indexes you don't need

On a logging table receiving thousands of inserts per second, every index is a contender for removal. Profile to find which indexes are actually used; drop the ones that aren't.


Always read the query plan first

Read the plan, first.

The query planner picks the index. If it picks wrong, no amount of new indexes will help — and adding indexes blindly is how production tables end up with twenty of them, half unused. EXPLAIN ANALYZE on the slow query is always the first move.

What to look for: Seq Scan on a large table — likely wants an index. Index Scan with a wildly inaccurate row estimate — statistics are stale (ANALYZE the table). Bitmap Heap Scan visiting most of the table — the index isn't selective enough; query may be better served by a composite or covering index.

Index hygiene, weekly

Postgres exposes pg_stat_user_indexes with idx_scan counts. Indexes with zero scans over weeks are deletion candidates — though dropping one inside an active transaction takes a heavy lock, so plan the window. Most production databases accumulate indexes over years; a periodic prune is one of the cheapest performance wins available.

The other index types, and when each one fits

Beyond the B-tree default.

B-tree · default
The right answer for ~95% of cases. Equality, range, ORDER BY, group-by all benefit.
GIN (Generalized Inverted Index) · Postgres
For arrays, JSONB, full-text search. Each value points back to all rows containing it. Slower writes, dramatically faster contains-style queries.
GiST (Generalized Search Tree) · Postgres
Geometric, range, full-text alternatives. The PostGIS extension uses GiST for spatial queries.
BRIN (Block Range Index) · Postgres 9.5+
Stores min/max per block range. Tiny — ~100KB for a 100GB table — at the cost of slower lookup. Right answer for time-series tables where rows are physically clustered by time.
Partial index
Index only rows matching a predicate: CREATE INDEX ON orders (id) WHERE status = 'pending'. Smaller index, faster scans on the matching subset.
Expression index
Index a function of a column: CREATE INDEX ON users (lower(email)). The query must use the same expression to hit the index.
Hash index · Postgres
Equality only, never range. Was unsafe before Postgres 10 (not WAL-logged); since 10 it's WAL-logged. Rarely beats B-tree in practice; almost never used.

The expression-index gotcha. A query that looks like it should hit the index often doesn't because the planner sees a different expression. WHERE email = 'foo' won't use an index on lower(email); you'd need WHERE lower(email) = 'foo'. Always verify with EXPLAIN ANALYZE.



A closing note

Indexes are not magic — they are one of two data structures (a B-tree, occasionally a hash; LSM-tree variants exist too — see the storage-engine simulator) selected to match the query shape. Read the plan, match the index to the query, pay attention to write cost, drop what isn't used. Most production performance problems live inside this loop. None of it is hard once you know which tree you're climbing.

Related Database indexing deep dive B-tree Bloom filter
Found this useful?