Volume VIII — opening pages

A cabinet of shapes.

Every piece of software you have ever used is a small zoo of these structures, in different combinations. Here are the ones that matter — with the stories of who invented them, the mechanism in plain language, the costs, and where they live in the wild.

The linear primitives.

A row of cells, in different costumes.

Array

A row of equal-sized boxes.

1957 · John Backus & the FORTRAN team

Open full page →

Story

The first thing FORTRAN gave the world. Backus's team needed a way to address a "table" mathematically, by index, in constant time. Every modern memory system since has been shaped by that one decision: contiguous, fixed-stride, addressable by arithmetic.

Mechanism

Allocate n × sizeof(T) bytes. Element i lives at base + i × sizeof(T). The CPU can prefetch the next cache line because access is predictable.

Cost

access[i]·O(1)

append·O(1)*

insert·O(n)

search·O(n)

Lives in

Every language. Every database row. Every image, audio buffer, GPU texture, network packet.

Key insight

Cache-friendliness beats algorithmic complexity for n < ~10,000. Linear scan of an array often beats a binary search of a tree because the former never misses cache.

Dynamic array (vector)

An array that grows itself.

1972 · Implicit; popularised by C++ vector (1990)

Open full page →

Story

How do you have an array but not know its size in advance? Allocate too small, double when full, copy. Amortised O(1) append. The growth factor is folklore (Java uses 1.5×, Go and Python use ~1.25×, C++ commonly 2×), and each pick costs differently in memory vs copies.

Mechanism

Maintain (data, len, cap). On append if len = cap, allocate 2 × cap, memcpy old data, free old block. The amortisation: total copies ≤ 2n across n appends, so per-append average is O(1).

Cost

append·O(1) amortised

access[i]·O(1)

pop·O(1)

insert·O(n)

Lives in

C++ std::vector, Rust Vec, Go slice, Python list, Java ArrayList, JavaScript Array. Almost every "list" you've ever used.

Key insight

Growth factor is a memory-vs-time tradeoff. 2× wastes up to 50% memory but minimises copies. 1.5× wastes only 33% but copies more often. There is no right answer — only profiling.

Linked list

A chain you can splice.

1955–58 · Newell, Shaw & Simon · RAND Corporation

Open full page →

Story

Built into the IPL language for symbolic AI research. The trio wanted to manipulate structures of structures (symbol manipulation, theorem proving), and arrays were too rigid. The cons cell of LISP descends directly from this work two years later.

Mechanism

Each node holds a payload and a pointer to the next. Insert/delete at a known position is O(1): just rewire pointers. Random access is O(n) because there is no arithmetic shortcut.

Cost

insert (known node)·O(1)

delete (known node)·O(1)

access[i]·O(n)

search·O(n)

Lives in

Linux kernel's list_head. Java's LinkedList. Functional languages' "cons" lists. Most concurrent queues. The free list inside any allocator.

Key insight

Almost always slower than an array in practice. Pointer chases miss cache. Useful when (a) you need O(1) splicing, (b) you don't know capacity, or (c) the language gives you no choice.

Stack

Last in, first out.

1946 / 1955 · Alan Turing implicit; named by Klaus Samelson & Friedrich Bauer

Open full page →

Story

Turing used a "reserve" of return addresses in his 1946 ACE design. Samelson and Bauer named it Keller ("cellar") in 1955, to manage operator precedence in compiler design, what every modern parser still does.

Mechanism

Push and pop at the same end. Implementable as a dynamic array (cheap) or linked list (rare). The OS thread stack is just a hardware-managed instance.

Cost

push·O(1)

pop·O(1)

peek·O(1)

search·O(n)

Lives in

Every function call you've ever made. JVM operand stack. Undo history. DFS. Expression parsing. Backtracking solvers.

Key insight

A program with deep recursion is a stack overflow waiting to happen. Tail-call optimisation eliminates the stack frame for tail-recursive calls. Required by Scheme, optional in most languages.

Queue

First in, first out.

Pre-history · Implicit since punch-card machines

Open full page →

Story

Older than computer science. Every batch system since the 1940s queued jobs. The interesting question is the implementation: circular buffer, linked list, lock-free ring? Each has a different concurrency story.

Mechanism

Enqueue at tail, dequeue at head. Two pointers chase each other around a fixed buffer (ring) or grow as needed (linked).

Cost

enqueue·O(1)

dequeue·O(1)

peek·O(1)

Lives in

Every web server's pending-request buffer. BFS. Print spoolers. OS scheduler runqueues. Kafka, RabbitMQ, NATS: at the bottom of every message broker.

Key insight

Lock-free queues (Michael-Scott 1996, LMAX Disruptor) are an entire research field. Adding concurrency turns "trivial" into a PhD topic.

Deque

Both ends, equally.

1972 · Don Knuth · TAOCP Vol. 1

Open full page →

Story

A queue that allows insert and delete at both ends: a "double-ended queue". Knuth catalogued it as the unifying abstraction over stacks and queues. C++ std::deque made it ubiquitous; Python collections.deque is the same idea in a single line.

Mechanism

Most efficient implementation is a sequence of fixed-size blocks linked in a list, with separate front and back pointers. Random access is O(1); both-end ops are O(1) amortised.

Cost

push_front·O(1)

push_back·O(1)

pop_front·O(1)

pop_back·O(1)

Lives in

C++ STL deque. Sliding-window algorithms. Browser history. Work-stealing schedulers (each worker has a deque; thieves steal from the back, owner from the front).

Key insight

The work-stealing deque is the magic behind Java's ForkJoinPool, Go's scheduler, Rust's Rayon, .NET's TPL. One data structure, one paper, every modern parallel runtime.

Ring buffer

A fixed-size queue that never allocates.

Pre-1960s · Implicit in hardware FIFOs

Open full page →

Story

Older than the field. Present in every UART, every DMA controller, every audio buffer. Computer science formalised what hardware engineers already had: a queue that wraps around in a fixed array, no allocation, ever.

Mechanism

Two indices (head, tail) modulo the array size. Empty when head == tail; full when (tail + 1) mod n == head. Wastes one slot to disambiguate; some variants use a separate counter.

Cost

enqueue·O(1)

dequeue·O(1)

capacity·fixed

Lives in

Linux kernel's kfifo. Audio buffers (always). Network packet rings. The LMAX Disruptor handles 25M ops/sec on a single core because it's "just" a ring buffer with magic.

Key insight

No allocation = no GC pauses, no malloc, no fragmentation. The right choice when you cannot afford to call the allocator inside a hot path — which is most hot paths.

Trees, with shape.

Hierarchies that stay balanced under load.

Binary search tree

Sorted, but with shortcuts.

1960 · Conway Berners-Lee · while at Ferranti

Open full page →

Story

Conway Berners-Lee (yes, Tim's father) described the first BST in 1960 while building software for the Ferranti Pegasus. The shape is so natural that several people invented it independently around the same time. Naive BSTs are also famous for degenerating to a linked list when keys arrive sorted.

Mechanism

Each node has a key and two children: smaller keys go left, larger right. Lookup descends the tree comparing at each step.

Cost

lookup·O(log n) avg, O(n) worst

insert·O(log n) avg

delete·O(log n) avg

inorder traversal·O(n)

Lives in

Pedagogy and not much else. Every "real" BST in production is balanced (AVL, red-black, B-tree). Naive BSTs survive in toy interpreters and CS courses.

Key insight

Sorted insertion order produces a height-n tree — exactly the disaster a tree was meant to avoid. Self-balancing variants are the only sane production choice.

AVL tree

Always height-balanced, to within one.

1962 · Adelson-Velsky & Landis · USSR

Open full page →

Story

Two Soviet computer scientists (the V in AVL is for Velsky) published the first self-balancing BST. The constraint: at every node, the heights of the two subtrees differ by at most one. The result: log-base-φ depth, the tightest balance bound of any common tree.

Mechanism

After every insert/delete, rotate to restore the height-difference invariant. There are four rotation cases (LL, LR, RL, RR); all are local, touching at most three nodes.

Cost

lookup·O(log n)

insert·O(log n)

delete·O(log n)

Lives in

Less common than red-black trees in production. AVL is read-optimal but write-heavy. Used in Windows NT's VirtualAlloc, some database engines, in-memory geometric searches.

Key insight

AVL trees are "more balanced" than red-black trees: better for read-heavy workloads. Red-black wins when writes dominate, because rebalances are cheaper.

Red-black tree

Almost balanced, much faster to maintain.

1972 · 1978 · Rudolf Bayer; reformulated by Guibas & Sedgewick

Open full page →

Story

Bayer's "symmetric binary B-tree" of 1972 is the same idea. But Guibas and Sedgewick's 1978 reformulation, with red and black node colours, made the invariant memorable. The colour rules let the tree be 2× as deep as a perfectly balanced one in the worst case, but every operation is O(log n) and rebalancing is cheaper than AVL.

Mechanism

Every node is red or black. Five rules: root black; every leaf (NIL) black; red nodes have black children; every root-to-leaf path has the same number of black nodes; new nodes are red. Inserts and deletes recolour and rotate locally to restore the invariants.

Cost

lookup·O(log n)

insert·O(log n)

delete·O(log n)

Lives in

Linux kernel's CFS process scheduler, virtual memory areas, epoll. C++ std::map, std::set. Java's TreeMap, TreeSet. The most-deployed balanced tree.

Key insight

When you read kernel source code and find a "rb_tree", you are reading code that runs on every Linux machine in the world. It's probably the most-executed data structure that isn't a hash table.

B-tree / B+ tree

The shape of a database index.

1971 · Rudolf Bayer & Edward McCreight · Boeing

Open full page →

Story

Designed at Boeing Scientific Research Labs to keep an index page-resident on disk. The "B" stands for nothing definitive: Bayer, balanced, Boeing, broad. Bayer himself never said. Every relational database since uses a B+tree variant. SQLite's author Richard Hipp once said: "B-trees are the only data structure that matters."

Mechanism

Each node holds many keys (typically 100–500), all leaves at the same depth. The fan-out keeps the tree shallow — billions of records in just 4 levels. B+trees push all data to leaves and link them, so range scans are sequential disk reads.

Cost

lookup·O(log_b n)

insert·O(log_b n)

range scan·O(log_b n + k)

Lives in

PostgreSQL, MySQL InnoDB, SQLite, Oracle, SQL Server, every B-tree-based KV store. Filesystems: ext4 HTree, NTFS, HFS+, BTRFS, ReFS, ZFS.

Key insight

When your index doesn't fit in RAM, only B-trees stay sane. A 100-million-row index is 20 levels deep in a binary tree; 4 levels in a B+tree with fanout 200. That's the difference between 20 disk seeks and 4.

Heap (binary heap)

A priority queue you can grep.

1964 · J. W. J. Williams

Open full page →

Story

Williams introduced the heap in his 1964 heapsort paper — and accidentally invented the priority queue. The structure is a complete binary tree where each parent is ≤ its children (min-heap) or ≥ (max-heap), stored in an array using arithmetic indexing.

Mechanism

For node at index i: parent at (i−1)/2, children at 2i+1 and 2i+2. Insert appends and "sifts up"; extract-min replaces root with last leaf and "sifts down". Both are O(log n).

Cost

insert·O(log n)

extract-min·O(log n)

peek·O(1)

build heap·O(n)

Lives in

Dijkstra's algorithm. A* pathfinding. Job schedulers. Linux kernel's timerfd. heapq in Python's standard library. Top-k aggregations in Spark, Flink, ClickHouse.

Key insight

The "build heap in O(n)" result is one of the prettiest in algorithm analysis. Bottom-up heapify costs n × Σ(height/2^level) = O(n) — not O(n log n) as you'd expect.

Skip list

Probabilistic balance, beautifully.

1989 · William Pugh · University of Maryland

Open full page →

Story

Pugh's paper is a masterpiece of plain language: "Skip Lists: A Probabilistic Alternative to Balanced Trees". Instead of complex rotations, every node flips a coin for how high to be. Tall nodes form an "express lane" past short ones. Average-case all-O(log n), worst-case O(n). But the worst case has probability 2^(−n).

Mechanism

A linked list with extra forward pointers at exponentially decreasing densities. Lookup walks the highest level until overshooting, drops down, repeats. Insert: flip coins to determine height, splice in at all relevant levels.

Cost

lookup·O(log n) avg

insert·O(log n) avg

range scan·O(log n + k)

Lives in

Redis sorted sets (ZSET). LevelDB and RocksDB's in-memory MemTable. Concurrent maps in Java's ConcurrentSkipListMap. Apache Cassandra.

Key insight

Lock-free skip lists are easier to write than lock-free balanced trees. That single property is why they appear inside high-concurrency systems where red-black trees would need a global lock.

Trie (prefix tree)

A tree you walk one character at a time.

1959 · 1960 · René de la Briandais · Edward Fredkin

Open full page →

Story

Briandais introduced the structure for retrieval; Fredkin gave it the name "trie", pronounced "tree" originally, now usually "try" to disambiguate. Each edge is labelled with one character; each node is a prefix. Searching for a string of length k takes O(k), not O(k log n).

Mechanism

Root is empty string. Each child edge is one character; each node optionally marks "this prefix is a complete word". Lookup walks the input character by character. Insert allocates nodes for any new prefixes.

Cost

lookup (length k)·O(k)

insert·O(k)

autocomplete·O(k + matches)

Lives in

Search autocomplete. Spell-checkers. IP routing tables (a radix trie variant). Firewall ACLs. Browser URL bar. Virtually every text-input system has a trie hiding in it.

Key insight

Trie memory cost can dominate. Radix tries (compressed tries) merge single-child chains into edge labels: same lookup, dramatically less memory. The Linux kernel's routing table is a level-compressed trie.

Segment tree

Range queries in logarithmic time.

1977 · Jon Bentley

Open full page →

Story

Bentley introduced segment trees in his thesis on computational geometry. Pre-aggregate the array at every power-of-two granularity; answer any range query as the union of O(log n) covering nodes. Generalises to any associative aggregate: sum, min, max, GCD, matrix multiplication.

Mechanism

A balanced binary tree where each leaf is one array element and each internal node holds the aggregate of its leaves' range. Range query walks two pointers up the tree picking off the largest fully-contained nodes. Lazy propagation extends it to range updates in O(log n) amortised.

Cost

build·O(n)

range query·O(log n)

point update·O(log n)

range update + lazy·O(log n)

Lives in

Editor minimum-indentation queries. Database query-optimiser cost arithmetic. Time-series percentile rollups. Every "competitive programming" interview round above easy.

Key insight

A segment tree handles any associative aggregate; a Fenwick tree (BIT) is faster but only handles invertible ones (sum, XOR). Pick segment for min/max/GCD; pick Fenwick for sum.

Fenwick tree (BIT)

Prefix sums, with constant tweaks.

1994 · Peter Fenwick · University of Auckland

Open full page →

Story

Fenwick noticed that the binary representation of an index encodes a clean partition of [1..n]. The result: a one-array structure that handles point updates and prefix-sum queries in O(log n) using a single bit-twiddle (i &= -i) at each step. Smaller and faster than a segment tree, simpler to implement.

Mechanism

A flat array of n + 1 entries. Each slot covers a range determined by the lowest set bit of its index. Update walks "up" by adding the lowest-bit index; prefix-query walks "down" by subtracting it. Twelve lines of code total.

Cost

point update·O(log n)

prefix sum·O(log n)

range sum·O(log n)

Lives in

Online analytics, leaderboards, count-of-events-in-window. Anywhere prefix-sum-with-updates is needed. Notably absent from min/max workloads. Those need a segment tree.

Key insight

Only invertible aggregates work (the range-sum identity sum(l..r) = prefix(r) − prefix(l-1) requires subtraction). For sum, count, XOR: Fenwick wins on speed, memory, and code length.

Treap

BST and heap, secretly the same tree.

1989 · Aragon & Seidel

Open full page →

Story

A treap (tree + heap) gives every node a random priority. The keys form a binary search tree; the priorities form a heap. Because priorities are random, the resulting tree shape mimics a random BST: depth O(log n) with high probability. Aragon and Seidel proved this; the construction is shockingly compact.

Mechanism

Insert: BST-insert by key, then rotate up while the priority is greater than the parent's. Delete: rotate down to a leaf, then remove. Split and merge by key fall out cleanly. The structure shines for "give me the order statistics of [a, b)" workloads.

Cost

lookup·O(log n) expected

insert·O(log n) expected

split / merge by key·O(log n) expected

Lives in

Competitive programming (where the simplicity wins over AVL/RB). Some functional language libraries. Used as the underlying ordered map in some persistent data-structure libraries.

Key insight

Worst-case O(n) (if the random source is unlucky). The expected O(log n) bound holds across all inputs but the bound is over the random priority choice, not the input distribution.

Splay tree

No metadata — and surprising amortised guarantees.

1985 · Sleator & Tarjan

Open full page →

Story

A splay tree carries no balance metadata at all. Every operation rotates the accessed node to the root via a sequence of "splay" steps. Sleator and Tarjan proved that any sequence of m operations on a tree of size n takes O((m + n) log n): same amortised bound as red-black, with much simpler code.

Mechanism

Search, insert, and delete each splay the touched node to the root. The splay operation uses three rotation patterns (zig, zig-zig, zig-zag). Cache locality is excellent because frequently-accessed nodes drift toward the root, and the tree adapts to skewed access patterns automatically.

Cost

lookup·O(log n) amortised

insert·O(log n) amortised

delete·O(log n) amortised

Lives in

Compilers (symbol tables for variable lookups in long-lived sessions). Some kernel structures. Linux's VFS dentry cache uses a splay-tree-like adaptive structure.

Key insight

Worst-case single operation can be O(n). Don't use splay trees in latency-bounded contexts; use them when access patterns are skewed and amortised cost is what matters.

III

Hashing & lookup.

Constant time, with caveats.

Hash table

O(1), almost always.

1953 · Hans Peter Luhn · IBM internal memo

Open full page →

Story

Luhn's 1953 memo at IBM was the first written description of hashing — chained collision lists, modulo-prime hashing. He never published it; the idea spread through internal IBM technical reports until Donald Knuth catalogued it in TAOCP Vol. 3 (1973). Luhn also invented the now-eponymous credit-card checksum.

Mechanism

Hash function maps key → bucket index. Two collision strategies: chaining (each bucket is a list) and open addressing (probe to next slot). Load factor (entries / buckets) controls collision rate; resize when above ~0.75.

Cost

lookup·O(1) avg, O(n) worst

insert·O(1) amortised

delete·O(1) avg

Lives in

Every language's dictionary/map type. CPython dict (open addressing with perturbation). Java HashMap. Go map. Rust HashMap. Database hash indexes. Compiler symbol tables. Caches everywhere.

Key insight

A bad hash function turns a hash table into a linked list. Hash flooding attacks (exploiting predictable collisions) are why every modern hashmap uses a randomised hash seed.

Cuckoo hashing

Two slots, no chains.

2001 · Rasmus Pagh & Flemming Rodler · Aarhus

Open full page →

Story

Two Danish researchers solved the worst-case-O(1) lookup problem. Each key has two possible homes; on collision, the new key kicks the old one out (the "cuckoo") to its other home. Looks impossible until you read the analysis — when load factor is below 50%, infinite chains are vanishingly unlikely.

Mechanism

Two hash tables, two hash functions. Lookup checks both. Insert: place at h1; if occupied, evict that key to its h2; if that's occupied, evict it to its h1; continue until empty slot or rehash.

Cost

lookup·O(1) worst case

insert·O(1) amortised expected

Lives in

Cuckoo filter (a Bloom filter alternative). Some routers' MAC tables. Memcached extensions. CockroachDB transaction status records.

Key insight

Cuckoo hashing's worst-case-O(1) lookup is rare and valuable — most hash tables only guarantee O(1) on average. Used where adversarial inputs make average-case bounds insufficient.

Robin Hood hashing

Steal from the rich, give to the poor.

1985 · Pedro Celis · University of Waterloo

Open full page →

Story

In standard linear probing, lookup time varies wildly — some keys live at their hash, others probe 30 slots. Celis's insight: when inserting, if the slot you want belongs to a key that's "richer" (closer to its home) than you, evict it. The variance in probe distance collapses; max probe length stays under 10 even at 95% load.

Mechanism

Open addressing with linear probing. Every entry tracks its probe distance. On insert collision, compare distances; the entry with shorter distance "yields" and is reinserted further on. Lookups can stop early when probe distance exceeds the target's.

Cost

lookup·O(1) tight variance

insert·O(1) expected

Lives in

Rust's default HashMap (since 2016, replaced in 2018 by hashbrown which uses SwissTable). Several JIT compiler hash structures. Mozilla's memory profiler.

Key insight

Robin Hood's low variance lets you run at higher load factors safely — ~90% load with ~6-slot max probe distance. Saves memory in cache-pressured workloads.

Consistent hashing

Hashing that survives node changes.

1997 · David Karger et al · MIT

Open full page →

Story

Akamai's founders (Karger and his MIT classmates) needed to map web requests to a fluid pool of servers — without rehashing the entire keyspace when one came or went. The answer: hash both keys and servers onto a circle; each key goes to the next clockwise server. Adding or removing a server only moves K/N keys.

Mechanism

Hash each node to one or more positions on a [0, 2^32) circle. Hash each key the same way. Each key is owned by the first node clockwise from its position. Virtual nodes (each physical node placed at many circle positions) smooth the distribution.

Cost

lookup·O(log N) (binary search of nodes)

add/remove node·O(K/N) keys move

Lives in

Every distributed cache: Memcached cluster sharding, Redis Cluster, Cassandra, DynamoDB, Riak. Most CDNs. Akamai's original use case.

Key insight

The whole thing was Karger's 1996 thesis. A formerly minor academic problem, and now half of distributed systems run on it.

Perfect hashing

Zero collisions, by design.

1984 · Fredman, Komlós & Szemerédi

Open full page →

Story

For static sets — keys known in advance — you can do better than "average O(1)" — you can guarantee worst-case O(1). Fredman et al's "FKS" scheme uses a two-level hash; later constructions (CHD, BBHash, FCH) use only O(n) bits beyond the keys themselves, with O(1) guaranteed lookup.

Mechanism

Top-level hash function maps n keys to n buckets, with O(1) expected collisions per bucket. Each non-empty bucket gets its own perfect hash function for its 2–3 keys. Construction is randomised; expected O(n) time.

Cost

lookup·O(1) worst case

insert·rebuild

Lives in

Compiled language keyword lookup (CPython, V8). gperf-generated tables. Static dictionaries in scientific software. Python's "frozen" dicts in some contexts.

Key insight

Perfect hashing trades update flexibility for worst-case guarantees. Used wherever a set is built once and queried billions of times — the keyword tables in every compiler, for instance.

Probabilistic structures.

Wrong sometimes; small always.

Bloom filter

A set, in a thousandth of the space.

1970 · Burton Howard Bloom

Open full page →

Story

Bloom's paper "Space/Time Trade-offs in Hash Coding with Allowable Errors" proposed a tiny structure that says, with certainty, "definitely not in the set" or "probably in the set". The trick: use a few hash functions to set bits in an array; check by checking if all those bits are set. False positives are tunable; false negatives are impossible.

Mechanism

Bit array of size m; k hash functions. Insert: set the k bits given by hashing the key. Lookup: check if all k bits are set. False-positive rate ≈ (1 − e^(−kn/m))^k. Optimal k = (m/n) ln 2.

Cost

insert·O(k)

lookup·O(k)

space·~10 bits per element for 1% FP

Lives in

Cassandra row caches. Bitcoin SPV nodes. Chrome's URL safe-browsing list. Bigtable / LevelDB / RocksDB SSTable filters — checking if a key might be in a sorted file before opening it. Every modern LSM-based KV store.

Key insight

Bloom filters live in front of expensive lookups. The 1% false-positive cost is paid in vain disk reads; the 99% true-negative saves them. The structure exists because disk is slow.

HyperLogLog

Count distinct, in 12 KB.

2007 · Flajolet, Fusy, Gandouet, Meunier · INRIA

Open full page →

Story

Counting unique elements in a stream of billions with sub-percent error in 12 KB of memory. The trick: hash each element; for each, count the leading zeros in the hash. The maximum leading-zero count seen is a stochastic estimator of the cardinality.

Mechanism

Partition the hash output into m buckets. For each bucket, track the maximum leading-zero count of all hashes that landed in it. The harmonic mean of 2^count across buckets, with bias correction, estimates the cardinality. Standard error ≈ 1.04/√m.

Cost

add·O(1)

cardinality·O(m)

merge·O(m)

Lives in

Redis (PFCOUNT, PFADD). BigQuery APPROX_COUNT_DISTINCT. Druid uniques. Snowflake APPROX_COUNT_DISTINCT. Every "unique users" metric at internet scale.

Key insight

Mergeable: HLL of region A union HLL of region B = HLL of A∪B, with no recomputation. That property is why HLLs power dashboards across sharded data — each shard returns its sketch, the gateway merges.

Count-Min sketch

Approximate frequencies, in tiny memory.

2003 · Graham Cormode & S. Muthukrishnan

Open full page →

Story

Bloom filter's cousin for counting: how many times has key X been seen? Cormode and Muthukrishnan gave a small grid of counters — query the minimum across rows. Overestimates only; never undercounts. Tunable error ε with probability 1−δ in O((1/ε) log(1/δ)) space.

Mechanism

A 2D array, d × w. Each row uses a different hash function; each cell is a counter. Insert: for each row, hash the key, increment that cell. Query: take the minimum count across all rows.

Cost

increment·O(d)

estimate·O(d)

space·O((1/ε) log(1/δ))

Lives in

Network anomaly detection (top talker IPs). Twitter's trending topics. Spark's frequency aggregations. Distributed top-K analytics.

Key insight

Count-Min sketch is used wherever exact counts are unaffordable. The "elephant flow" detection in carrier networks — finding heavy hitters out of billions of packets — runs on Count-Min.

MinHash

Set similarity, in a few hashes.

1997 · Andrei Broder · AltaVista

Open full page →

Story

Broder needed to detect near-duplicate web pages at AltaVista. Computing Jaccard similarity (|A∩B|/|A∪B|) for billions of pages was infeasible. His insight: hash each set's elements with k different functions; keep the minimum hash from each. The fraction of matching minimums is an unbiased estimator of Jaccard similarity.

Mechanism

Pick k hash functions. For each set, compute h(x) for every element x and keep the minimum: that's the set's "signature" under that function. Across k functions, you get a k-dimensional signature. Comparing two signatures: count matching positions, divide by k.

Cost

signature·O(k|A|)

similarity·O(k)

Lives in

Web deduplication (Google, Bing). Netflix similarity recommendations. datasketches Apache library. Locality-Sensitive Hashing pipelines.

Key insight

MinHash + Locality-Sensitive Hashing turns "find all pairs with Jaccard > 0.7" from O(n²) into O(n) — feasible at web scale, infeasible without.

t-digest

Quantiles, fairly accurate, fully streaming.

2013 · Ted Dunning · MapR

Open full page →

Story

Computing exact percentiles requires sorting all the data. Approximate methods (GK summary, Q-digest) work but degrade at the tails — exactly where p99 and p99.9 live. Dunning's t-digest cleverly allocates more resolution to the tails by clustering points with non-uniform weights.

Mechanism

Maintain a list of "centroids" (mean, weight) compressed by a scale function that gives more resolution near 0 and 1 quantiles. Inserts merge into nearest centroid or create a new one. Quantile queries interpolate between centroids.

Cost

add·O(log k)

quantile·O(log k)

space·O(k) ~ tens of KB

Lives in

Elasticsearch percentiles. Cassandra latency aggregates. Apache Druid. Every time-series database that ships pre-aggregated p99s — including Prometheus's histogram_quantile.

Key insight

t-digests are mergeable. Each shard or pod can compute its own; the central server merges them for the cluster-wide p99. Without merge, you cannot do approximate percentiles across distributed systems.

Strings & sequences.

Substrings, in interesting time.

Suffix array

All suffixes, sorted.

1990 · Udi Manber & Gene Myers · University of Arizona

Open full page →

Story

Manber and Myers proposed an alternative to suffix trees that used 4× less memory while supporting most of the same queries. Sort all suffixes of the string lexicographically; store just the starting indices. Find any pattern with binary search in O(m log n).

Mechanism

Build the array of starting positions of every suffix, sorted by the suffix string. Pattern search: binary search for the lexicographic position of the pattern. SA-IS (2009) constructs in linear time.

Cost

build·O(n) (SA-IS)

pattern search·O(m log n)

longest common prefix·O(1) with LCP array

Lives in

BWA / Bowtie genome alignment (combined with FM-index). bzip2's Burrows-Wheeler step. Full-text search engines' index merging.

Key insight

A suffix array + LCP array gives every operation a suffix tree gives, in less memory. The "suffix tree" you read about in textbooks has been suffix arrays in production for two decades.

Suffix tree

Every substring, in linear space.

1973 · Peter Weiner; Ukkonen's online construction (1995)

Open full page →

Story

Weiner's 1973 thesis introduced "position tree" — a tree where every leaf is a suffix and each path is a unique substring. Ukkonen's 1995 algorithm builds it online, in O(n), as the string streams in. The bioinformatics revolution of the 1990s ran on it.

Mechanism

A compressed trie of all suffixes. Internal nodes are labelled with substrings; edges are merged when there's a single-child chain. Every distinct substring of S corresponds to a path from root to some position in the tree.

Cost

build·O(n)

pattern search·O(m)

longest repeat·O(n)

longest common substring (k strings)·O(total length)

Lives in

Computational biology (sequence assembly, mutation finding). Plagiarism detection. Some compression algorithms.

Key insight

Suffix trees were hot in the 1990s; today suffix arrays + LCP win on memory. But the ideas — particularly Ukkonen's online construction — are foundational and worth understanding.

Rope

A tree of small strings.

1995 · Boehm, Atkinson & Plass · Xerox PARC

Open full page →

Story

Editing a 100 MB document represented as a single string is O(n) per insert. Boehm et al at PARC proposed a balanced binary tree where leaves are short strings ("ropes") and internal nodes carry total-length annotations. Insert in the middle becomes O(log n).

Mechanism

Each leaf is a fixed-size character buffer (e.g. 64 bytes). Internal nodes have left/right children plus the total length of the left subtree. Concatenation is just a new internal node; substring is a recursive walk; insert is a split + concat.

Cost

concat·O(1) (new node)

index·O(log n)

insert·O(log n)

substring·O(log n + len)

Lives in

Atom and VS Code (text buffer). Cedar / Plan 9 sam editor. Helix editor. The data structure inside every modern code editor for huge files.

Key insight

Without ropes, opening a 1 GB log file in a text editor on most systems would take a minute and freeze on every keystroke. The data structure makes the experience instant.

Piece table

Edit a file without copying it.

1971 · Implicit; popularised by Microsoft Word

Open full page →

Story

A 1971 paper by Charles Crowley described representing a document as a sequence of references into immutable buffers — the original file, plus an "add" buffer of new edits. Microsoft Word built its document model on this; Visual Studio Code's editor still uses it (with some clever buffer-list compression).

Mechanism

Two buffers: the original file (read-only) and an append-only edit buffer. The document is a linked list (or tree) of pieces (start, length, which-buffer). Insert: append to edit buffer, splice a new piece into the list. Delete: split or trim pieces. Undo: pop from a stack of operations.

Cost

insert·O(log n)

delete·O(log n)

undo·O(1)

Lives in

Microsoft Word. AbiWord. The text buffer in Visual Studio Code (since 2018, after a famous performance overhaul).

Key insight

Piece tables make undo nearly free — the original is never modified, so undoing is just reverting the piece list. Other text buffers must replay forward to reconstruct old states.

Gap buffer

A buffer with a hole at the cursor.

~1969 · Implicit; canonical in Emacs

Open full page →

Story

A surprisingly simple structure that powers Emacs and several other editors. Allocate a buffer larger than needed; keep a "gap" of empty slots near where the cursor is. Inserts are O(1) at the cursor. Moving the cursor is O(distance moved) — but you only do that occasionally.

Mechanism

A linear buffer split into three parts: the bytes before the cursor, an empty gap, and the bytes after. Inserts fill the gap. Cursor movement memmoves bytes across the gap. When the gap fills, allocate a bigger buffer and reset the gap.

Cost

insert at cursor·O(1) amortised

delete at cursor·O(1)

move cursor·O(distance)

Lives in

Emacs (since the 1970s). Notepad-like simple editors. The "single cursor" model maps perfectly.

Key insight

Gap buffers are one O(distance) memmove per cursor jump — fine for typing, painful for "find & replace" across a 100 MB file. Ropes and piece tables are why modern editors stay fast on big files.

Graphs & sets.

Connections, with structure.

Adjacency list

Each node, its neighbours.

Pre-1960s · Implicit; Knuth catalogued

Open full page →

Story

For sparse graphs (most real graphs), a list of edges per vertex beats a |V|² matrix. Storage is O(V+E) rather than O(V²); iteration over neighbours is O(degree). Used everywhere except dense academic problem sets.

Mechanism

An array indexed by vertex. Each entry is a list (vector or linked list) of (neighbour, weight) pairs. For directed graphs, each edge appears once; undirected, twice.

Cost

add edge·O(1)

check edge (u,v)·O(deg(u))

neighbours·O(deg(u))

space·O(V + E)

Lives in

Every BFS and DFS. Web crawlers. Social network graphs. Linux's scheduler dependency tracker. NetworkX, igraph, Boost Graph Library — everything.

Key insight

For graphs with E ≪ V², adjacency list is always better. The exception: dense graphs (E ≈ V²) where the matrix's O(1) edge check dominates list scans.

Disjoint Set / Union-Find

Track connected components, in a forest.

1964 · Bernard Galler & Michael Fischer · UMich

Open full page →

Story

A breathtakingly simple structure: each element points to a "parent"; root elements represent their set. Galler and Fischer's 1964 paper introduced it. Tarjan's 1975 analysis with path compression and union-by-rank proved the amortised cost is O(α(n)) — the inverse Ackermann function, which is < 5 for any imaginable n.

Mechanism

Each element has a parent pointer; the root's parent is itself. Find: walk up to root, optionally compress path on the way down. Union: find both roots, link the smaller-rank tree under the larger.

Cost

find·O(α(n)) ≈ O(1)

union·O(α(n)) ≈ O(1)

Lives in

Kruskal's minimum spanning tree. Equivalence-class problems. Dynamic graph connectivity. Image segmentation. Hadoop's job DAG planner.

Key insight

Tarjan's analysis is mathematically beautiful — the amortised cost is bounded by α(n), which grows so slowly it never exceeds 4 for any practical input. It's O(1) for all practical purposes, with a proof.

Merkle tree

Hash of hashes; tamper-evident.

1979 · Ralph Merkle · Stanford

Open full page →

Story

Merkle's 1979 thesis introduced a tree of hashes — leaves are data, every internal node is the hash of its children. Verifying any leaf belongs to the tree requires only the path of sibling hashes up to the root. Cryptographically tamper-evident; sub-linear proofs.

Mechanism

Hash each data block (leaves). Each internal node's value is the hash of its children's values. The root hash summarises the entire tree. Inclusion proof: log₂(n) hashes (the siblings on the path).

Cost

build·O(n)

verify inclusion·O(log n)

root·O(1) summary

Lives in

Bitcoin and Ethereum (transaction trees). Git (commit/tree objects). IPFS. ZFS, Btrfs (block checksums). Certificate Transparency logs. Database replication checksums.

Key insight

A Merkle root commits to gigabytes of data in 32 bytes. Bitcoin uses this so that "light" SPV wallets can verify a transaction's inclusion in a block by downloading only ~32 hashes — the full block can be tens of megabytes.

LSM-tree

Append-only, merged in the background.

1996 · O'Neil, Cheng, Gawlick, O'Neil

Open full page →

Story

Patrick O'Neil et al noticed that disk is fast at sequential writes and slow at random ones. Their LSM-tree (Log-Structured Merge) writes everything sequentially to a small in-memory buffer, flushes it sorted to disk, and merges those sorted runs in the background. The pattern powers every modern key-value store.

Mechanism

In-memory memtable buffers writes. When full, it's flushed to a sorted SSTable file. SSTables are merged in tiered or levelled compactions to limit the number of files a read must check. Bloom filters in front of each SSTable cut the read amplification.

Cost

write·O(1) amortised (memtable)

read·O(log n × levels)

space amp·~1.1× post-compaction

Lives in

RocksDB (and everyone using it: MyRocks, CockroachDB, TiKV, Yugabyte). LevelDB. Cassandra. ScyllaDB. HBase. BigTable. ClickHouse. Pebble.

Key insight

LSM-trees trade write amplification (10–30× the logical write) for sequential I/O — the trade-off SSDs love. Read amplification (checking many levels) is the cost; bloom filters and compactions are how you manage it.

DAG · directed acyclic graph

Order without cycles.

Foundational · Implicit since graph theory itself

Open full page →

Story

Not a single inventor — but a single, recurring shape. Build systems, package managers, version control, distributed schedulers, dataflow engines: anything where "X must happen before Y" is acyclic, and the natural representation is a DAG.

Mechanism

Vertices = tasks/objects; edges = dependencies. Topological sort produces an execution order in O(V + E). Cycles signal an error.

Cost

topological sort·O(V + E)

critical path·O(V + E)

cycle detection·O(V + E)

Lives in

make, Bazel, Buck, Pants. Git's commit graph. Spark / Flink / Beam DAGs. Airflow / Prefect / Dagster. Kubernetes admission webhooks (validation order). Every dataflow engine ever shipped.

Key insight

When you find yourself with a cycle in something that should be a DAG, the answer is almost always: extract the cyclic part into a separate component and break it. Tools that allow cycles eventually wish they hadn't.

VII

Spatial structures.

Indexes on geometry.

KD-tree

Binary search, in higher dimensions.

1975 · Jon Bentley · Stanford

Open full page →

Story

Bentley's structure splits points by a different coordinate at each level — first by x, then by y, then x again. Nearest-neighbour search uses bound-pruning to skip vast regions. Works beautifully up to ~10 dimensions; degrades to brute-force above that ("curse of dimensionality").

Mechanism

Recursively partition the point set: pick a splitting dimension (round-robin or max-variance), find the median, put smaller-coord points left, larger right. Search prunes branches whose minimum bound is farther than the current best.

Cost

build·O(n log n)

kNN·O(log n) avg low dim, O(n) high dim

range·O(√n + k) (2D)

Lives in

Photo sphere finders. Point-cloud processing (LiDAR). 2D collision detection. Astronomy queries. scipy.spatial.KDTree. PostGIS spatial indexes.

Key insight

KD-trees are excellent up to ~10 dimensions, then collapse. For higher-dim nearest-neighbour (text/image embeddings), HNSW and IVF-PQ are the modern answer.

R-tree

A tree of bounding boxes.

1984 · Antonin Guttman · UC Berkeley

Open full page →

Story

Guttman's R-tree organises rectangles (or higher-dim hyperrectangles) by their bounding boxes. Internal nodes hold the union bounding box of their children. Spatial queries skip whole subtrees whose bounding boxes don't intersect the query. Standard in every spatial database.

Mechanism

Each node holds up to M entries; each entry is (bounding box, child or data). Insert chooses the path with smallest area increase; splits when full. Variants: R*-tree (better split heuristic), R+-tree (no overlap), Hilbert R-tree (space-filling curve order).

Cost

range query·O(log n + k)

kNN·O(log n + k)

insert·O(log n)

Lives in

PostGIS (default index). MySQL spatial types. Oracle Spatial. SpatiaLite. Almost every "find nearby" feature on a mapping product.

Key insight

R-trees handle extents (boxes, polygons, lines), not just points. That's the difference between "find restaurants within 5km" and "find every road that intersects this neighbourhood".

Quad-tree / Octree

Recursive 4-way / 8-way subdivision.

1974 · Raphael Finkel & Jon Bentley

Open full page →

Story

Recursively split 2D space (or 3D, for octree) into four (eight) equal quadrants. Each level halves resolution; deeper levels resolve dense regions. Used everywhere in graphics, physics simulation, and image processing.

Mechanism

Each node either stores the points in its quadrant directly (leaf), or has 4 children covering 4 sub-quadrants. Splits when point count exceeds a threshold. Search and insert recurse only into the relevant quadrants.

Cost

build·O(n log n) avg

point query·O(log n) avg

range query·O(log n + k)

Lives in

Image quadtree compression. Game-engine spatial partitioning. Procedural generation. Adaptive mesh refinement in physics simulations. Minecraft's chunk lookup uses something like an octree.

Key insight

Quadtrees adapt to data density — sparse regions stay shallow, dense regions get deeper. That's the property that makes them dominant in graphics and simulation.

BVH · bounding volume hierarchy

A tree of boxes for ray tracing.

1976 · James H. Clark · UC Santa Cruz

Open full page →

Story

Clark proposed wrapping every object in a bounding box, then organising those boxes in a tree. Rays test the root box; if hit, recurse to children; if not, prune. The whole structure is rebuilt or refit per frame in dynamic scenes. Without BVHs, modern ray tracing would not exist.

Mechanism

Each leaf wraps an object; each internal node's box bounds its children's boxes. Ray traversal: front-to-back recursive descent, pruning subtrees whose box the ray misses. Quality of the tree matters enormously — SAH (Surface Area Heuristic) is the standard build metric.

Cost

build (SAH)·O(n log n)

ray cast·O(log n) avg

Lives in

Every modern ray tracer. NVIDIA RTX hardware. Embree. PBRT. Unreal's Lumen. Unity's ray-traced ambient occlusion.

Key insight

A BVH built poorly can be 10× slower than one built well. SAH-built BVHs are why hardware-accelerated ray tracing on a 1080p frame is possible in real time.

Z-order / Hilbert curve

Linearise space, preserve locality.

1966 / 1891 · G. M. Morton · 1966 (Z-order); David Hilbert · 1891 (curve)

Open full page →

Story

Morton encoded 2D coordinates by interleaving the bits of x and y — points near each other in 2D land near each other in 1D, with discontinuities at quadrant boundaries. The Hilbert curve does the same thing without the jumps. Both let you index multi-dimensional data with a single B-tree.

Mechanism

Z-order: interleave the bits of (x, y) → one integer. Sort by that integer; nearby points cluster. Hilbert: a continuous curve that visits every grid cell with no jumps; more expensive to compute but better locality.

Cost

encode·O(bits)

range query·O(log n + k)

Lives in

Spatial keys in DynamoDB, Bigtable. Geohashes (a Z-order variant). NUMA-aware allocators. Database multi-dim index. PostGIS GiST when SP-GIST is faster.

Key insight

A space-filling curve turns multi-dimensional indexing into a single-key sort — every existing B-tree becomes a spatial index. Geohashes (used by every "nearby" query in social apps) are exactly this.

Adjacent

Read the papers, push the specimens.

Every structure here has a story; many have a paper. The Papers volume catalogs them; the Simulators let you push them around with a cursor. And if you'd rather have this cabinet sequenced into a larger route, the apprenticeship's data-structures task takes it in three passes — read the idea, push on it in a simulator, build it from a blank file.

Open Papers → Push the specimens

Found this useful?