Binary Heap Simulator: smallest always on top.
A balanced tree backed by a flat array, with the smallest (or largest) element always on top. The hidden engine of every priority queue, every Dijkstra, every k-way merge, and every Top-K dashboard. This page lets you push the buttons.
Two views of one structure. The tree on top is what textbooks draw: every parent smaller than its children in min mode, root marked in amber. The strip of cells below is what your computer stores: the same nodes laid flat, level by level, no pointers, just the index arithmetic printed above it. The [i] labels on the tree nodes match the cell numbers, so when an operation swaps a pair you see the same pair light up in both views at once. The counters up top track every comparison and swap.
Type 0 into the insert box and press push. It lands in the last cell, then sift-up swaps it with each parent until it sits at the root. Watch the highlight climb and check the swap counter: three or four swaps, not nine, because the path from leaf to root is only log n long. Then press extract min and watch the reverse: the last element gets dropped onto the root and sinks back down. The thing worth keeping is that the array never has a gap; the tree is just a way of reading it.
What is a binary heap?
Give me the smallest of a million items, repeatedly.
A binary heap is a complete binary tree where every parent is ordered relative to its children — min-heap (parent ≤ children) or max-heap (parent ≥ children). It supports insert and extract-min in O(log n), and surprisingly build-heap in O(n). J. W. J. Williams introduced it in 1964 as the data structure behind heapsort. Today: priority queues, Dijkstra's algorithm, the OS process scheduler, the network packet scheduler.
Imagine you are scheduling jobs in an operating system kernel. There are tens of thousands of runnable threads at any moment, each with a priority. You need to answer one question, very quickly, very often: which thread should run next? The answer is whichever has the highest priority. New threads keep arriving, finished threads keep leaving, and priorities can change. The naive answer — keep an unsorted list and scan it every time you need the highest priority — is O(n) per scheduling decision. With ten thousand threads scheduling a thousand times a second, that is ten million comparisons per second devoted purely to picking who runs next. The kernel is now spending more time deciding than executing.
Sorting the list helps reads but ruins writes. A sorted array can answer "give me the smallest" in O(1), but inserting a new thread costs O(n) because half the elements may have to shift. A balanced binary search tree gives both reads and writes in O(log n), but each node carries two child pointers and the structure rebalances on insert and delete. For a problem this hot, the constant factor matters as much as the asymptotic.
The binary heap is the structure designed for exactly this question. It promises three things: peek at the minimum (or maximum) in O(1); insert in O(log n); remove the minimum in O(log n). It does this with a single contiguous array, no pointers, no rebalancing — the simplest data structure on this perf curve. The cost of skipping pointers is that random search and arbitrary delete are not supported in O(log n); a heap promises only repeated access to the most extreme element, which is the question priority queues actually ask.
The applications cascade from there. Dijkstra's shortest-path algorithm needs the next-smallest tentative distance on every iteration; without a heap it is O(V²), with one it is O((V + E) log V) — the difference between practical and impractical on a million-node road graph. K-way merge sort fuses sorted streams with a min-heap of stream heads. A "top 100 trending hashtags this minute" widget runs a min-heap of size 100 over the firehose; for each tweet, if its count exceeds the heap's root, replace the root and sift down. Memory stays at 100 entries regardless of input size; time is O(n log k) where k is the result count.
Concretely: a min-heap of one million 64-bit integers takes 8 MB of contiguous memory and answers extract-min in roughly 20 comparisons — about 40 ns on a modern CPU, well inside L1 cache. The same operation on an unsorted array of a million entries takes a million comparisons, around 1 ms. Five orders of magnitude. The simulator above lets you push, extract, and bulk-heapify on small examples; the dynamics on a million items are identical, just scaled.
Heap structure — a complete tree with a parent-child order rule
A complete tree with an order rule.
A binary heap is two things at once: a complete binary tree (every level filled left-to-right, with no gaps) and a partial order on parent–child pairs. In a min-heap, every parent is smaller than its children. In a max-heap, every parent is larger. Crucially, no rule is imposed between siblings — only along the parent-to-child path.
Because the tree is always complete, you can store it in a flat array with no pointers. The root sits at index 0; the children of the node at i live at 2i + 1 and 2i + 2; the parent of i sits at ⌊(i − 1) / 2⌋. That arithmetic alone is the entire navigation system — no left / right fields, no allocator overhead, no pointer chases. The cache locality is excellent: walking a path from root to leaf reads a handful of array slots that are likely already in L1.
The order rule is weaker than a binary search tree's. In a BST every left descendant is less than the node and every right descendant is greater — full sorted order. In a heap, the only thing you know cheaply is the root: it's the global min (or max). Asking "where is the value 7?" still costs a full O(n) scan because nothing constrains the rest of the structure beyond the parent-child invariant.
That seemingly limited contract turns out to be the right one for an enormous class of problems. Whenever you need repeated access to the most extreme element from a changing collection — the next pixel to render, the next packet to schedule, the lowest-cost path to extend, the closest neighbour to consider — a heap solves it in O(log n) per operation. The applications fan out from there.
Origins — Williams 1964, heapsort, Floyd's linear build
Williams 1964, heapsort, Floyd's linear build.
Williams 1964, heapsort, Floyd's linear build.
Williams 1964, heapsort, Floyd's linear build.
The structure was named and formalised by J. W. J. Williams in 1964 in Algorithm 232: Heapsort (Communications of the ACM, June 1964). Williams's paper was a single page that introduced two ideas at once — the binary heap as an array-backed complete binary tree with the parent-order invariant, and an in-place sorting algorithm built on top of it. His insight was that you could sort an array of n elements in O(n log n) worst case using only the input array itself, no auxiliary memory. At the time this was a significant improvement over mergesort's O(n) extra space and quicksort's O(n²) worst case.
Williams's original heapify was top-down: insert each element one at a time into a growing heap, costing O(n log n). Robert Floyd noticed within months that bottom-up was strictly better, in Algorithm 245: Treesort 3 (CACM, December 1964) — by starting from the bottom of the tree and running sift-down on each non-leaf, the work telescopes to O(n). The proof is a small geometric series, but the practical effect is dramatic: building a heap of a million items costs about two million comparisons under Floyd, versus twenty million under Williams. Every modern stdlib's make_heap uses Floyd.
The structure went on to power Dijkstra's shortest-path algorithm (Edsger W. Dijkstra, Numerische Mathematik, 1959) — though Dijkstra didn't have a name for the priority queue when he wrote it; the heap formalisation came five years later. Donald Knuth's The Art of Computer Programming, Volume 3: Sorting and Searching (Addison-Wesley, 1973) gave heaps and heapsort their rigorous treatment in section 5.2.3, including the bound proofs and variants like the d-ary heap. Subsequent developments — pairing heaps (Fredman, Sedgewick, Sleator, Tarjan, 1986), Fibonacci heaps (Fredman & Tarjan, 1987) — added asymptotically faster operations at a constant-factor cost that mostly keeps them out of production. The 1964 binary heap is still the workhorse.
The name “heap” is unrelated to the C/Unix “heap memory.” Williams chose the word for its everyday meaning of an unordered pile with a single most-extreme item visible on top — the right metaphor for a structure that orders only the parent-child relation, not siblings. The clash with operating-system terminology came later and the data-structure use predates it; it is the OS that should have picked a different word.
Insert — sift up, one swap at a time
Insertion bubbles up one swap at a time.
An insert costs O(log n). The new value is appended to the end of the array (the next empty slot in the bottom level of the tree, preserving completeness). Then it walks toward the root, comparing itself to each parent. If it's smaller (in a min-heap), the two are swapped and the walk continues. If it's not, the walk stops — the heap invariant is locally satisfied, and locality means it's globally satisfied because the path above was already a heap. This walk is called sift-up, percolate-up, or sometimes up-heapify.
function insert(value) {
arr.push(value);
siftUp(arr.length - 1);
}
function siftUp(i) {
while (i > 0) {
const parent = (i - 1) >> 1;
if (arr[i] >= arr[parent]) break; // min-heap invariant holds
[arr[i], arr[parent]] = [arr[parent], arr[i]];
i = parent;
}
} The maximum number of swaps is the height of the tree, ⌊log₂ n⌋. For a heap of a billion items the height is just 30 — that's the worst-case path length. The expected number of swaps is even smaller: most inserts settle within a level or two because the random newcomer is unlikely to be smaller than every node on the way up. In practice, sift-up on uniform random data averages around 1.6 swaps regardless of n.
Push the simulator's Insert button repeatedly with descending numbers (e.g. 12, 10, 8, 6, 4, 2). Watch each new value climb all the way to the root because it's smaller than every parent it meets. Then push ascending numbers and notice they barely move — they sit near the leaves where they belong.
Extract-min — put the wrong element on top, then fix it
Removal puts the wrong element on top, then fixes it.
Extract-min is the operation that justifies the whole structure: given a changing collection, return the smallest element and remove it, all in O(log n). The mechanism is a small bit of clever bookkeeping. The minimum is at index 0, so reading it is free. To remove it without leaving a hole, we move the last element of the array (the rightmost leaf of the bottom level) into position 0. The completeness invariant is preserved — the tree shrunk by one and the bottom-right slot vanished, exactly as a complete tree should.
But the new root is almost certainly out of place: a leaf that suddenly has no nodes above it. So we walk it down. At each level, we compare it to its two children, find the smaller, and swap if the smaller child is smaller than the parent. The walk continues until the parent is at most equal to both children, or we reach a leaf. This is sift-down.
function extractMin() {
if (arr.length === 0) return null;
const root = arr[0];
const last = arr.pop(); // shrink first
if (arr.length > 0) {
arr[0] = last; // move last into root
siftDown(0);
}
return root;
}
function siftDown(i) {
const n = arr.length;
while (true) {
const l = 2 * i + 1, r = 2 * i + 2;
let best = i;
if (l < n && arr[l] < arr[best]) best = l;
if (r < n && arr[r] < arr[best]) best = r;
if (best === i) break;
[arr[i], arr[best]] = [arr[best], arr[i]];
i = best;
}
} Sift-down is also bounded by the tree's height, so extract is O(log n). The constants are slightly worse than sift-up: each level requires up to two comparisons (left vs right child, then winner vs parent) instead of one. In practice extract spends most of its time at the bottom because the value moved to the root started life as a leaf and is likely to belong back near a leaf.
Use the simulator: insert a handful of values, then click extract min repeatedly. Notice how each extracted value comes out in sorted order. That's heap sort in slow motion: build a heap, pull from the root until empty, you have a sorted sequence. The total cost is O(n log n) — provably optimal for comparison-based sorting.
Build-heap — building from scratch is O(n), not O(n log n)
Building from scratch is linear, not log-linear.
A common interview-grade misconception: building a heap from n items by repeatedly inserting them costs O(n log n), so heapify "must be" O(n log n) too. Wrong by a factor of log n. The bottom-up procedure due to Robert Floyd (1964) builds the heap in O(n) — provably linear, with a small constant.
The trick is to start from the bottom and run sift-down on each non-leaf, working upward. The leaves are already trivially valid one-node heaps; the second-to-last level needs at most one swap each; the next level up at most two; and so on. Sum the work and you get a series that converges:
function heapify(arr) {
// Floyd's algorithm — O(n).
for (let i = (arr.length >> 1) - 1; i >= 0; i--) {
siftDown(i);
}
} The intuition: the work at each level is the height of the subtree (at most), and a complete tree has half its nodes at the leaves (height 0), a quarter one above (height 1), an eighth at height 2, and so on. The total work is bounded by ∑ k · n / 2^(k+1), a geometric series that sums to less than n. This is the canonical example in algorithm courses of how worst-case-per-operation reasoning can give a misleading aggregate bound.
| Method | Comparisons | Big-O |
|---|---|---|
| Repeated insert (n times) | ~n log₂ n | O(n log n) |
| Floyd heapify (bottom-up) | ~2n | O(n) |
| Heap sort (heapify + n extracts) | ~2n + 2n log n | O(n log n) |
If you need a heap from a known set of values, always heapify rather than insert one at a time. The simulator's random ×8 and ×15 buttons run heapify and you can watch the bottom-up sweep happen if you set animation speed slow enough.
The hidden engine of priority.
Heaps are everywhere, often invisibly, because they are the textbook implementation of a priority queue and priority queues are the textbook tool for "give me the next most important thing." A non-exhaustive list of where you'll find a heap powering production code:
- Dijkstra's shortest-path algorithm uses a min-heap to repeatedly pick the unvisited node with the smallest tentative distance. Without it the algorithm is O(V²); with it, O((V + E) log V). Every routing engine, every grid pathfinder, every navigation app.
- Operating system schedulers use heaps (or close variants like timing wheels) for run queues and timer wheels: the kernel's "what wakes up next" structure.
- Network packet schedulers use weighted-fair-queueing variants built on heaps. The "next packet to forward" decision is a heap pop.
- K-way merge in a database query plan or external-sort algorithm: feed K sorted streams into a min-heap of (value, source-id), pop the smallest and refill from that source. Memory-efficient even when the streams are gigabyte-scale.
- Top-K problems: "give me the top 100 trending hashtags this minute." Maintain a min-heap of size 100; for each item, if it's larger than the heap's root, replace the root and sift down. O(n log k) total instead of O(n log n).
- Event-driven simulation: every discrete-event simulator (queueing, biology, finance, games) keeps the future-event list as a heap.
- Huffman coding: building the optimal prefix tree requires repeated extract-min from the forest of sub-trees.
- Median-finding: a min-heap and a max-heap balanced against each other gives O(log n) median updates over a stream.
Two important variants. d-ary heaps let each node have d children instead of two. A 4-ary heap has half the height of a binary heap, so sift-down does fewer levels — but each level compares against four children instead of two. The break-even depends on the cache hierarchy; benchmarks generally favour 4-ary or 8-ary heaps on modern hardware. The Linux kernel's perf-event code uses a 4-ary heap.
Pairing heaps and Fibonacci heaps are pointer-based heaps that achieve O(1) amortised insert and decrease-key, at the cost of much worse constants and cache behaviour. They look beautiful in algorithm books and almost never appear in production. The standard advice: start with a binary heap; only reach for fancier variants if a profile says decrease-key is the bottleneck.
Heapsort — a sort hidden inside the heap
A sort hidden inside the heap.
The simplest application of a heap is also the most underused: it gives you an O(n log n) in-place comparison sort, often called heapsort. The recipe takes two passes over the array.
function heapsort(arr) {
// Pass 1 — Floyd heapify in place. O(n).
for (let i = (arr.length >> 1) - 1; i >= 0; i--) siftDown(arr, i, arr.length);
// Pass 2 — extract the max one at a time, parking it at the end.
for (let end = arr.length - 1; end > 0; end--) {
[arr[0], arr[end]] = [arr[end], arr[0]]; // swap root with last
siftDown(arr, 0, end); // re-heapify shrinking range
}
} Heapsort uses zero extra memory beyond the input array — every operation is in-place. It is also non-recursive (sift-down is a loop) so it doesn't blow the call stack on large inputs. And it's deterministic: same input, same swaps, same number of comparisons every time, which makes it easy to certify and reason about. Yet in practice almost no production library uses heapsort as the default sort. Why?
The answer is cache behaviour. Heapsort's sift-down jumps from index i to 2i + 1 repeatedly — a series of long, growing jumps through the array. Each jump is likely to miss L1 and often L2 too. Quicksort, by contrast, sweeps the array in two contiguous passes per partition, with near-perfect prefetcher behaviour. On modern hardware, quicksort's constant factor is 2-3× better than heapsort despite the same Big-O.
So heapsort lives in three niches. Embedded and real-time systems where its O(n log n) worst-case guarantee matters — quicksort's worst case is O(n²) on adversarial input, unacceptable when latency is bounded. Introsort hybrids (used by C++ std::sort and Java's Arrays.sort for primitives) start with quicksort but switch to heapsort when recursion depth exceeds a threshold, getting quicksort's average-case speed and heapsort's worst-case bound. Memory-constrained environments where you can't afford the O(log n) auxiliary stack quicksort uses. Anywhere else, quicksort or its variants (Tim Sort for stability, Pattern-defeating Quicksort for adversarial input) win.
| Sort | Average | Worst case | Stable | In-place | Used in |
|---|---|---|---|---|---|
| Quicksort | O(n log n) | O(n²) | no | O(log n) stack | C qsort, Go sort |
| Heapsort | O(n log n) | O(n log n) | no | yes | worst-case fallback |
| Mergesort | O(n log n) | O(n log n) | yes | O(n) extra | Java objects |
| Tim Sort | O(n log n) | O(n log n) | yes | O(n) extra | Python, Java |
| Introsort | O(n log n) | O(n log n) | no | O(log n) | C++ std::sort |
Heap variants — Fibonacci, pairing, leftist, binomial
Variants for specific workloads.
The binary heap is the default but not the only player. Three variants come up often enough to know by name.
d-ary heaps are the simplest generalisation: each node has d children instead of two. The tree is shorter (height = logd n instead of log2 n) so sift-up does fewer levels — at the price of comparing against d siblings instead of 2 at each sift-down level. The break-even depends on the hardware: a 4-ary heap is typically faster than a binary heap for large workloads because four child pointers fit in one cache line and the reduced height pays for the extra comparisons. The Linux kernel's perf-event subsystem uses a 4-ary heap; many real-time schedulers use 8-ary. Above d = 16, the per-level comparison cost dominates and performance degrades.
Pairing heaps are pointer-based heaps that use a clever "merge two heaps by making the smaller root the parent of the larger" rule. They achieve O(1) amortised insert, decrease-key, and merge — all of which are O(log n) in a binary heap. The catch: they're not implementable as a flat array (every node has a parent and sibling pointer), so cache behaviour is much worse. Pairing heaps tend to win when decrease-key is on the hot path (Dijkstra on dense graphs) and lose otherwise. Most production code uses binary heaps despite the worse asymptotic, because the constant factors swamp the asymptotic difference until n is enormous.
Fibonacci heaps are the theorist's darling: O(1) amortised insert, decrease-key, and merge; O(log n) amortised extract-min. They appear in CLRS and look beautiful in algorithm textbooks. In practice, almost no production code uses them because the constant factors are catastrophically large — the bookkeeping for amortisation involves a forest of trees with sibling/marker pointers and a "consolidate" step that runs every extract-min. A binary heap is 5-10× faster on real workloads at any reasonable n. Knuth's quip applies: "premature optimisation is the root of all evil," and Fibonacci heaps are pre-emptive optimisation against an asymptotic that never arrives.
A cleaner taxonomy: pick a binary heap for almost everything. Pick a d-ary heap (d = 4 or 8) when sift-down is the hot path and you've measured. Reach for a pairing heap when decrease-key dominates and a careful benchmark says it helps. Don't reach for Fibonacci heaps unless you're writing a competition entry that needs the asymptotic.
A short curriculum for heap fluency
A short curriculum for heap fluency.
If you want heaps in your hands rather than your head, run this sequence. Each exercise builds on the last and surfaces a different facet of the structure.
- Implement a min-heap from scratch in your favourite language using only an array and the parent/child arithmetic. Insert, extract-min, peek. No imports. Verify against 10,000 random inserts and 10,000 random extracts that the output is sorted.
- Implement
siftUpandsiftDownas recursive functions. Then rewrite them iteratively. The iterative form is what production code uses (the recursive form costs stack frames you don't need). - Implement Floyd's heapify. Time it against repeated-insert on n = 1M random integers. Confirm the linear-vs-log-linear gap empirically.
- Heap sort. Build the heap in place (use the input array as the heap), then extract-min one at a time. Print sorted output. Compare runtime to
Array.prototype.sort()— they're in the same league. - Top-K stream. Read a million words from a file, output the 100 most frequent. Use a min-heap of size 100 keyed by frequency. Constant memory regardless of input size.
- Median maintenance. Read a stream of integers; after each, print the running median. Two heaps: a max-heap of the lower half, a min-heap of the upper half, kept within size 1 of each other.
- K-smallest pairs from two sorted arrays. Use a heap of (i, j) pairs ordered by sum. Classic interview problem; revealing exercise in heap state design.
- Dijkstra on a sparse graph. Implement with a binary heap and benchmark against a naive O(V²) version on a 10,000-node random graph. Watch the speedup.
Two interview gotchas worth memorising. First: heaps don't sort — they extract in sorted order. The array after a heapify is not sorted; it's heap-shaped. Don't read array indices in a heap and expect any meaningful order beyond the parent-child rule. Second: changing a value already in the heap (decrease-key, increase-key) requires either knowing the index of that value (an external map) or doing a full O(n) scan to find it. The standard binary heap doesn't support O(log n) decrease-key out of the box; if your problem needs it (like Dijkstra), bring an index-tracking layer.
Number of leaves in a heap of size n: ⌈n / 2⌉. Number of internal nodes: ⌊n / 2⌋. The bottom half of the array is leaves; the top half is the rest. Floyd's heapify only needs to run sift-down on the top half.
What goes silently wrong with heaps in production
What goes silently wrong.
Min-heap is the default in Python. heapq is min-only. To get a max-heap, push the negation of each value (heapq.heappush(h, -x)) or wrap items in a comparator class. JavaScript and Go have no built-in heap — both standard practice is a hand-rolled implementation or a small library. C++ has std::priority_queue (default max-heap) and std::make_heap/push_heap/pop_heap on iterators (max-heap). Java has PriorityQueue (min-heap by default).
Comparator stability is not guaranteed. If two items compare equal, the order between them is arbitrary in most heap implementations. If you need stable ordering, encode a tiebreaker (insertion sequence number) into the key.
Heaps don't shrink memory automatically. If you push a million items and pop them all, the underlying array is still sized for a million. If you re-use the heap across phases, that's fine; if you don't, replace the array.
Threading. Heaps are not thread-safe in any standard library. For concurrent use, either synchronise externally or use a concurrent priority queue (Java's PriorityBlockingQueue, .NET's PriorityQueue with locking, lock-free skiplist-based variants).
The pathological case is calling decrease-key in a system that doesn't support it efficiently. Dijkstra implemented with a stdlib heap and "just push the smaller value, ignore stale ones on pop" is correct but does O(E log V) extra pops in the worst case — fine for sparse graphs, painful for dense ones. If you need true decrease-key, look at indexed heap implementations or pairing heaps.
Further reading on binary heaps
Primary sources, in order.
- Williams · 1964Algorithm 232: HeapsortCACM, June 1964. The single-page paper that introduced the binary heap and heapsort. Foundational.
- Floyd · 1964Algorithm 245: Treesort 3CACM, December 1964. The bottom-up O(n) heapify that every modern stdlib's
make_heapuses. - Knuth · TAOCP Vol. 3Sorting and Searching, Section 5.2.3The rigorous treatment of heaps and heapsort, including the bound proofs and d-ary variants.
- CLRS · 4th ed., 2022Introduction to Algorithms — Chapter 6The canonical textbook treatment with a careful proof that Floyd's heapify is O(n).
- Sedgewick & Wayne · 2011Algorithms, 4th ed. — Chapter 2.4Friendly explanation of indexed priority queues with full Java code on the book site.
- Fredman, Sedgewick, Sleator, Tarjan · 1986The Pairing Heap: A New Form of Self-Adjusting HeapAlgorithmica. The simplest pointer-based heap; surprisingly fast in practice once decrease-key dominates.
- Python docsheapq — Heap queue algorithmThe reference implementation everyone reaches for in interview questions and quick scripts. Worth reading the implementation notes for the canonical idioms.
- Sedgewick · CourseraAlgorithms, Part I — PrincetonFree MOOC. Week 4 covers priority queues and heapsort; the clearest pedagogical path with companion Java code.
- MIT 6.006 · OCWIntroduction to Algorithms — Lectures 4–5Erik Demaine on heaps and heap sort, with the linear-time analysis. Free MIT lectures.
- LWN · 2018A new high-resolution timer wheelLinux kernel timer-wheel rework. Heaps and timer wheels solve overlapping problems; the kernel team explains why timer wheels win for kernel scheduling.
- Semicolony simulatorDijkstra pathfindingWhere the heap pulls its weight: shortest-path search expanding from a frontier of (distance, node) pairs.
- Semicolony simulatorSorting visualizerHeap sort side-by-side with quicksort, mergesort, and Tim sort — the comparison made visible.