Trie Autocomplete Visualizer: a prefix tree, live.

A trie is a tree where each edge is one character, so words sharing a prefix share a path from the root. That makes prefix lookup O(length of the prefix), not O(number of words). Type a few letters and the suggestions are just the subtree hanging below that prefix.

Words
18
Suggestions
0

Type prefix
Insert
Suggestions
type something
Tree
root c a t e g o r y r d e f u l r y t g o d o g n e - d e a l o r - k n o b l l a r t - n o t
Recent ops
insert "cargo"
insert "category"
insert "done-deal"
insert "do-not"
insert "do"
insert "dot"
insert "dollar"
insert "door-knob"

What you're looking at

The diagram is the live prefix tree built from the seeded word list. Each circle is a node, each edge one character; follow any path from the black root and you spell a prefix. Nodes filled in the accent colour mark where a complete word ends. Type into "Type prefix" and two things update at once: the suggestion list fills with every word sitting below your prefix, with the matched characters tinted, and the nodes along your path pick up a heavier outline. Insert a word and watch it either branch off an existing path or extend one — common prefixes are shared, never duplicated.

Type "ca" first. The list returns car, card, care, cargo and the rest — exactly the subtree hanging below the c-a path, nothing scanned elsewhere. Now insert "cab": it reuses the existing c-a edges and adds a single b node. What should surprise you is that the work to answer a prefix query depends only on how many letters you typed, not on how many words the trie holds. A dictionary ten times larger answers the same query in the same number of hops.


What is a trie?

Why a search bar can't just scan a list.

A trie (pronounced "try" or "tree") is a tree where each path from the root to a leaf spells a key, sharing common prefixes between keys. Edward Fredkin coined the term in 1960 from "re-trieval." Tries give linear-in-the-key lookup independent of dictionary size — the structure behind autocomplete, IP routing tables (longest-prefix match), spell checkers, and the FST in Lucene's index.

Imagine you're building the search bar at the top of a website. The user has typed three letters — "car" — and you have to show ten suggestions before they finish typing the next character. Your dictionary holds three hundred thousand product names. The naive approach is to walk the whole list, keep the entries that start with "car", sort them by popularity, and return the top ten. That works for a thousand words. It does not work for three hundred thousand, and it absolutely does not work for the billion-key dictionaries that real search engines run on. Walking the list takes time proportional to the dictionary size, and you are paying that cost on every keystroke.

The first instinct is to reach for a hash table. Hashing answers the question "is car in the dictionary?" in expected constant time. But it answers the wrong question. The user did not type a complete word; they typed a prefix. A hash table has no notion of a prefix. There is no way, given "car", to ask the hash table "show me everything that starts with these three letters" without scanning every key it stores. Hashing scrambles the input on purpose; that is exactly what makes lookups fast for exact matches and useless for prefix queries.

The structure that solves this problem is the trie, a tree in which each edge is labelled with one character and each path from the root spells out a key. To check whether the dictionary holds anything starting with "car", you walk three edges — c, then a, then r — and read off the entire subtree below. The lookup time depends only on the length of the prefix, not on how big the dictionary has grown. A trie holding ten thousand words and a trie holding a billion words answer the same prefix query in the same three pointer hops.

That property — query latency that does not grow with the corpus — is why every autocomplete you have ever used is built on a trie or one of its close cousins. Google's "did you mean?" pipeline, the in-editor identifier completer in VS Code, the kernel's IP forwarding table, and the spell checker in your browser all reach for the same structure for the same reason. The simulator above is the simplest possible version: type a prefix, watch the descent light up, see the subtree of suggestions enumerate.


Trie structure — a path from root to leaf spells a word

A path from root to leaf spells a word.

Picture the dictionary as a tree. The root is empty. Each child of the root is a single letter — one branch for words that start with a, another for words that start with b, and so on through the alphabet. Each of those branches itself has children, one per letter that can follow. Walk from the root toward a leaf and you read off a word a character at a time. Stop early and you have a prefix; the subtree below you is every word the dictionary contains that begins with that prefix.

The diagram below shows what happens when the dictionary grows. With ten thousand words the trie is dense at the top and shallow on most paths — most English words are under fifteen characters, and the deepest leaf is the longest word. Push the dictionary to a billion entries and the tree fans wider but it does not get meaningfully deeper. The descent the user pays for, three pointer hops to reach "car", is exactly the same in both. That constant depth is the trick.

The cost of this trick is paid in memory: a trie spends a node on every distinct character of every key, even when most of those nodes have only one child. Production tries spend half their engineering budget on shrinking that overhead. We will get to those tricks. First, the basics — how the structure achieves its time bound, what it costs in space, and what changes when you ask the harder questions.

DEPTH STAYS CONSTANT AS THE DICTIONARY GROWS10⁴ wordsdepth ≈ 153 hops to find "car"10⁷ wordsdepth ≈ 223 hops to find "car"10⁹ wordsdepth ≈ 303 hops to find "car"Linear scan: 10⁴ → 10⁷ → 10⁹ steps. Trie: 3 → 3 → 3.

Origins — Edward Fredkin and the word "re-trieval"

Fredkin coined the word, re-trieval.

Edward Fredkin published Trie Memory in Communications of the ACM in 1960. He took the middle four letters of the word retrieval and pronounced his new structure to rhyme with tree. The data structure had appeared informally in Briandais' work on file searching a year earlier, but Fredkin gave it the name and the formal treatment that stuck. The paper described a memory organisation in which a key was decomposed character-by-character; each character indexed into a table of pointers; the path from root to leaf spelled the key. Lookup time depended on key length, not on dictionary size — a property that turned out to matter more as dictionaries grew.

The motivating problem was concrete. Computers in 1960 had kilobytes of core memory and a few megabytes of magnetic-drum or tape storage. Sorted-array binary search needed log₂ n comparisons but each comparison was on a whole string, and string comparison cost time proportional to the shared prefix. Hashing was known but its worst case looked terrifying on a constrained machine. The trie's appeal was that it traded memory — generous in the worst case — for a guarantee on time. Pages of pointers, but the question "is this prefix in the dictionary?" returned in time linear in the prefix length.

Fredkin's framing also anticipated the autocomplete use case sixty years before it became common. He observed that the structure naturally answered prefix queries: stop the descent at any depth and the subtree below enumerates the keys that share that prefix. Modern search-suggestion engines, code-completion popups, command-line tab expansion, and library-search autocompletes all use this property. The data structure outlived its original hardware by half a century because the operation it cheapened — prefix expansion — kept reappearing in new contexts.

The 1960 paper sat alongside two other prefix-tree precursors. De la Briandais in 1959 had described the same idea in a paper called File Searching Using Variable Length Keys; he used linked lists of children rather than tables. Morrison in 1968 published Practical Algorithm to Retrieve Information Coded in Alphanumeric — Patricia — the first systematic treatment of path compression. The lineage runs Briandais → Fredkin → Morrison, and every modern variant traces back through one of those three.

Knuth treated the structure in the third volume of The Art of Computer Programming (Sorting and Searching, first published 1973), placing it alongside binary search trees and hash tables as one of the three canonical answers to the dictionary problem. He also pointed out that the trie's worst-case behaviour is, in some sense, its best behaviour: the cost of a query is fully determined by the key length, so there are no pathological inputs and no rebalancing operations to worry about. That predictability is part of why tries kept reappearing in operating-system kernels and database internals long after fancier structures had been published.


Trie complexity — linear in the key, independent of the dictionary

Linear in the key, independent of the dictionary.

The headline result is that lookup, insert, and delete all run in O(L) where L is the length of the key. Crucially the cost does not depend on n, the number of keys stored. A trie holding ten thousand entries and a trie holding a billion entries answer "is the prefix car here?" in the same three pointer hops. That property is what makes the trie attractive for autocomplete: the latency the user perceives stays constant as the corpus grows.

The cost shows up in space. A naive trie with a fixed alphabet of size σ stores one node per character, and each node carries an array of σ child pointers. With ASCII that is 128 pointers per node; with byte-indexed nodes, 256; with Unicode codepoints, more than a hundred thousand if you take the alphabet at face value. Most slots are empty. A dictionary of a hundred thousand English words can balloon to hundreds of megabytes if implemented as a flat 26-way array — far worse than the keys themselves stored as plain strings.

The fix is to choose a child representation that matches the alphabet's density at each depth. Near the root the alphabet is full — every letter starts some word — so a flat array is appropriate. Deep in the tree most internal nodes have one or two children, and a hash map or a small sorted vector wins. The Linux kernel's FIB trie uses a hybrid representation that adapts the per-node fan-out from a single bit to a wide stride based on observed density; the result is a tree whose memory footprint scales with the actual entries, not with the worst-case alphabet.

Cache behaviour matters as much as theoretical bounds. A descent through eight nodes is eight pointer dereferences; if those nodes live in unrelated cache lines, each dereference is potentially a memory stall. Realistic traversal latency on a modern CPU is dominated not by the work the trie does but by the line-fill cost of nodes that have been evicted from L1 and L2. Implementations that pack nodes contiguously — double-array tries, succinct tries, or arena-allocated layouts — outperform pointer-chasing implementations by an order of magnitude on cold workloads, even though both have the same asymptotic complexity.

PURE TRIE · one node per characterRADIX TREE · single-child chains collapsedrootcardtcar · card · cart5 nodes wasted on linear chainroot"car""d""t"edge labels = strings3 nodes total · same set of keys

Radix tries — when a chain of single children collapses

When a chain of single children collapses into one edge.

A pure trie spends a node on every character of every key, regardless of whether the path branches. The word internationalization takes twenty nodes if no other key shares its tail. The wasted memory is not subtle: long words plus a sparse dictionary plus a fanout-26 layout multiplies into something unworkable. The first optimisation every production trie makes is to compress single-child chains into multi-character edge labels. The result is variously called a radix tree, a Patricia trie, or simply a compressed trie.

The original Patricia paper described the technique for binary keys: each internal node held a bit index and the algorithm traversed by examining that single bit of the search key, branching left or right. Donald Morrison's 1968 paper used Patricia tries for variable-length alphanumeric keys; the structure became canonical in IBM mainframe filesystems. The modern variant generalises the bit-by-bit branching to any alphabet, and the edge labels can be arbitrary character strings rather than single bits or characters.

The kernel's IPv4 forwarding information base — the FIB — is a radix tree on the bits of the destination address. Longest-prefix match becomes a descent that consumes one or several bits at a time, with edge labels packed densely. The trie returns the most specific routing rule whose prefix matches the destination. Linux uses a variant called the LC-trie (level-compressed trie) that further trades memory for shallower depth; lookups complete in a handful of cache-friendly probes for the typical Internet routing table of about a million entries.

Adaptive variants exist. The ART structure (Adaptive Radix Tree, Leis et al., 2013) chooses the per-node child layout dynamically: nodes with one to four children use a small sorted array, nodes with five to sixteen use a slightly larger one, sixteen to forty-eight use an indirection table, and dense nodes use a full 256-entry array. The result performs within a small factor of a hash table on point lookups while preserving ordered traversal — the property that makes radix structures useful for range queries and longest-prefix match. ART is now the index of choice in HyPer and several research column stores.

/* Patricia trie node — bit-index branching, classic Sedgewick layout */
struct patricia_node {
  int      bit;          /* index of bit to test (0 = MSB)  */
  void    *key;          /* full key stored at leaves only  */
  void    *value;
  struct patricia_node *left;   /* bit == 0 */
  struct patricia_node *right;  /* bit == 1 */
};
/* Internal nodes have bit > parent->bit; leaves are detected
   when child->bit <= parent->bit (the back-edge convention). */
Why Patricia is named that

PATRICIA expands to Practical Algorithm to Retrieve Information Coded in Alphanumeric, Donald Morrison's 1968 acronym. The structure is just the radix-tree principle applied bit-by-bit — but the name stuck because the contemporaneous IBM mainframe filesystem documentation cited the paper directly, and a generation of systems programmers learned the data structure under that label.


Suffix automata, transducers, and other prefix-tree cousins

Suffix automata, transducers, and other prefix-tree cousins.

A trie answers prefix questions about a dictionary of keys. A suffix tree answers substring questions about a single text by storing all suffixes of that text in a compressed trie. Ukkonen's online construction, published in 1995, builds the structure in time and space linear in the input length — the algorithm is famously subtle, with suffix links, active points, and remainder counters that take several careful readings to internalise. Suffix trees power exact substring matching, longest common substring, and longest repeated substring queries; they form the algorithmic foundation behind grep -F's multi-pattern matching and behind sequence-alignment tools in bioinformatics.

For very large texts the suffix tree's pointer overhead becomes prohibitive — roughly fifteen to twenty bytes per character of input is typical. The suffix array, introduced by Manber and Myers in 1990, stores the same information as a sorted array of starting offsets. It uses about four bytes per character with a longest-common-prefix table; the cost is that some queries become a binary search rather than a tree descent. Suffix arrays underlie most modern full-text indexing and the BWT-based compressors (bzip2 and a generation of read-aligners for DNA).

The finite-state transducer generalises the trie further. A trie shares prefixes; an FST shares both prefixes and suffixes by minimising the underlying automaton. The result is a directed acyclic graph rather than a tree, sometimes much smaller than the equivalent trie. Lucene's term dictionary uses an FST to map every indexed term to its posting-list pointer; a corpus with a billion distinct terms compresses to tens of megabytes of FST. The construction is an online algorithm that processes the input in sorted order, freezing finished suffixes and reusing them whenever a new key produces an identical tail.

Adjacent in spirit are DAWGs (directed acyclic word graphs) and MARISA tries, both of which exploit suffix sharing for static dictionaries. Ternary search trees, popularised by Bentley and Sedgewick in the late 1990s, use three-way branching at each node — less than, equal, greater — and effectively interleave a binary search tree with a trie. They use less memory than a flat-array trie at the cost of slightly more comparisons, and they handle non-ASCII alphabets gracefully.

FST · prefix sharing on the left, suffix sharing on the rightq0mstaaayyyshared"y"FSUFFIX SHARINGPREFIX FAN-OUT · {may, say, tay}a trie shares prefixes;an FST shares prefixesand suffixes — a DAG, not a tree.Lucene's term dictionary:10⁹ terms → tens of MB.

Trie implementations — six different physical shapes

The same algorithm, six different physical shapes.

The choice of how to lay out a trie's children in memory is what separates a textbook implementation from one that ships. The simplest representation is a hash map per node — each child is keyed by its character. Insertions and lookups translate directly. The cost is per-node overhead: sixty bytes or more per node on a typical hash-map implementation, plus a heap allocation. For dictionaries of a million keys this is the difference between fitting in L3 cache and not.

The opposite extreme is a flat array: each node is a contiguous block of σ pointers, indexed by character. Lookup is a single offset and a load. The downside is that most slots are empty for most nodes, which wastes both memory and cache lines. Sparse-array tricks help — store only non-empty slots in a sorted vector, paying a small linear scan per descent in exchange for far better cache behaviour.

The double-array trie, introduced by Aoe in 1989, packs an entire trie into two parallel integer arrays called base and check. To traverse from state s on character c, the next state is base[s] + c; the check array verifies that the destination really did come from s. The encoding is dense, branchless, and extremely cache-friendly. The trade-off is that insertions can require expensive rebalancing of the base array when a slot collision occurs. Double-array tries are the structure of choice in Japanese and Chinese morphological analysers (MeCab, Kuromoji), where the alphabet is huge and the dictionary is mostly static.

Succinct tries — LOUDS, BP, and friends — encode the structure of the tree itself in a bitvector and use rank/select operations to navigate. The space approaches the information-theoretic minimum (about two bits per node) while keeping operations within a logarithmic factor. Succinct tries are how Google's spell-checker and search query auto-complete fit billion-key dictionaries into memory on resource-constrained devices, and how RocksDB and Lucene store their on-disk index headers.

For Unicode-aware workloads, the alphabet is not 26 letters but more than 150,000 codepoints. Naive child arrays are unworkable; even hash maps become heavy. The pragmatic choice is to operate on the UTF-8 byte sequence rather than on logical codepoints — a 256-way trie on bytes is well-behaved, and Unicode normalisation (NFC or NFKC) at insertion time ensures that the same logical character does not produce two distinct paths.

Structure Lookup Prefix Memory Ordered
Trie (flat array)O(L)nativehigh (σ ptrs/node)yes
Radix / PatriciaO(L)nativemoderateyes
Hash tableO(L) hashnolowno
Sorted array + bsearchO(L log n)range scan onlyvery lowyes
FSTO(L)nativevery low (immutable)yes

Where tries live in production — routing tables, search bars, spell checkers

Routing tables, search bars, spell checkers, packet filters.

The most consequential trie in everyday computing is probably the kernel's IP forwarding table. Linux's net/ipv4/fib_trie.c implements a level-compressed radix tree over the bits of the destination address. Every packet a Linux box forwards consults this structure; the trie's worst-case depth determines the per-packet routing-decision latency. The implementation collapses chains, merges sparse subtrees, and re-encodes runs of consecutive routes; on a typical full-table router, longest-prefix match completes in a handful of cache lines.

Search-suggestion systems live on tries. Typing into a query bar walks the prefix into the trie, collects the subtree of matching keys, ranks them by query frequency or click-through, and returns the top few. The dictionary is rebuilt periodically from query logs; the live structure can answer thousands of suggestion requests per second per CPU core because the descent is short and cache-resident. When the corpus exceeds memory, the production technique is to host the trie as an FST on disk and rely on the OS page cache to keep hot prefixes hot.

Spell checkers and morphological analysers rely on tries too. aspell stores its dictionary as a compressed trie and uses Damerau-Levenshtein distance to enumerate corrections; hunspell, the spell-checker shipped with most browsers, uses a similar scheme. Japanese tokenisers MeCab and Kuromoji store their grammar dictionaries in double-array tries because the alphabet is enormous and the lookup must be fast enough to tokenise text in real time. Code-completion systems in modern editors store identifier dictionaries the same way.

Packet classifiers in firewalls and software load balancers use tries on the bits of multiple header fields. Cloudflare's NEL pipeline, AWS's network ACLs, and the Linux kernel's nft_set_rbtree all use a tree of bit-tests as their primary classification structure. The reason is the same as for routing: rules have prefix structure, and a prefix structure means a trie. When the rule set is small the engineering choice is a linear scan; when it grows past a few thousand the trie becomes the only structure that scales.

Compressed full-text indexes are another quiet home for tries. The header of a Lucene segment is essentially a finite-state transducer over the dictionary of indexed terms; Bleve, Tantivy, and other Lucene-inspired engines reuse the same structural idea. Postgres' GIN indexes for tsvector columns store a B-tree over individual terms, but the per-term posting lists themselves are byte-encoded prefix structures. Even file-system tools like ripgrep and ag use compiled tries internally to handle multi-pattern Aho-Corasick matching, where a single pass over the input simultaneously checks many literal patterns by walking a trie augmented with failure links.


When NOT to use a trie — the cases hash tables and sorted arrays win

Hash tables, sorted arrays, and the cases tries lose.

A trie is a lookup structure. Where it shines is when the question is about a prefix, not about an exact match. If your workload is purely key → value with no prefix queries, a hash table is almost always faster, smaller, and simpler. Hash tables answer in expected constant time independent of key length; tries take time proportional to the key. For long keys — URLs, file paths, identifiers — the difference can be a factor of ten on point lookups, and the engineering cost of running a custom trie versus a builtin HashMap is not small.

For ordered enumeration over short integer keys, a sorted array with binary search outperforms any trie up to remarkably large sizes. The cache behaviour of a contiguous array beats the pointer-chasing of a trie at every level until the array no longer fits in L3. Even then, a B-tree typically wins on disk-backed workloads because its per-node fan-out matches a page boundary, while a fanout-26 trie wastes most of every page on empty slots.

For approximate string matching — fuzzy search, typo tolerance — pure tries struggle. The state of the art uses a BK-tree (Burkhard-Keller, 1973) on edit distance, or the more recent Levenshtein automaton approach where the trie is intersected with a non-deterministic automaton that accepts strings within edit distance k of the query. The intersection prunes the search dramatically; this is the technique used by Lucene's fuzzy queries and by Google's "did you mean?" pipeline.

For very large static dictionaries that change rarely, the engineering choice is increasingly an FST over a regular trie. The FST is offline-built, immutable, and shares suffixes as well as prefixes; the on-disk size is often a tenth of the equivalent trie. The cost is that updates are expensive — you rebuild from scratch — so the technique only fits read-mostly corpora. Trie-vs-FST is a trade between insertion latency and resident size; if the corpus turns over once a day the FST wins, if it turns over every few seconds the trie does.

Finally, the structure is a poor fit when keys have no shared structure. Random hashes, UUIDs, opaque tokens — anything where the bits are designed to be uncorrelated — make terrible trie keys, because every insertion produces a fresh root-to-leaf path with nothing shared. The memory overhead becomes ruinous and the depth equals the key length. For workloads of this shape the right answer is the boring one: a hash table with a good hash function, sized to keep the load factor under three-quarters. Reach for the trie when the keys actually share prefixes and you need to ask prefix questions about them.


Further reading on tries

Primary sources, in order.

Found this useful?