Binary Search Simulator: a million keys, twenty comparisons.

Binary search finds a key in a sorted array in O(log n) by halving the search window every comparison. Twenty halvings get you from one million entries to one — the magic of log2.

Steps
0
Window
18
Result

Target
Array
Sorted array
3 0
8 1
12 2
17 3
23 4
28 5
34 6
41 7
47 8
52 9
58 10
63 11
69 12
74 13
81 14
88 15
92 16
97 17
Trace
— click step —

What you're looking at

A sorted 18-element array, each cell showing its value and index. Pick a target, then press Step to advance one comparison or Run to finish it. The two ends of the live search window are marked lo and hi; the cell being checked is the mid; cells already probed stay tinted and everything outside the current window dims out. The trace below logs each step's lo, hi, mid, and how arr[mid] compared to the target, while the counters up top track steps taken and the shrinking window size.

Search for 47 and step through: the window halves every comparison, so even an 18-cell array resolves in four or five steps. Try a target that isn't in the array, like 100, and watch lo cross past hi — that's how the search reports NOT_FOUND. The thing worth noticing is the step count barely moves as the array grows: doubling the data adds just one comparison. Regenerate the array a few times and the worst case never climbs above five steps for eighteen elements.


What is binary search?

A billion records, thirty comparisons.

Binary search finds an item in a sorted array in O(log n) comparisons by halving the search range each step. The technique predates programmable hardware — it was used on punched-card decks in the 1940s. Today it is the foundation of std::lower_bound, Java's Arrays.binarySearch, Python's bisect, and the index lookup inside every B-tree page.

Imagine you are looking for a phone number in a printed directory. You know the name; you want the number. The directory holds two hundred thousand entries in alphabetical order. The naive approach is to start at page one and read every name until you find the match. On average you read a hundred thousand names; in the worst case, all two hundred thousand. This is linear search, and at any larger size it is impossibly slow.

The trick is to use the order. Open the directory in the middle. Look at the name on that page. If the name you want comes alphabetically before, you can throw away the entire second half of the book; if after, you can throw away the entire first half. Either way, one comparison just eliminated a hundred thousand candidates. Open the middle of the remaining half, compare again, throw away another half. After about eighteen openings — eighteen halvings of two hundred thousand — you have one candidate left, and that is your answer.

This is binary search. It works whenever the data is sorted, and the cost grows as the logarithm of the dataset size rather than linearly. The math: each step halves the candidate window, so after k steps the window has size n / 2k. When that hits 1, you are done; solving for k gives k = log₂(n). For a million records, twenty steps; for a billion, thirty; for the eighteen quintillion entries that fit in a 64-bit address space, sixty-four. The number of comparisons grows so slowly with the dataset that at any practical scale the answer arrives in the time it takes to blink.

This single fact is why every database, every JIT compiler, every B-tree index, every spell-checker, every routing table, and every search bar you have ever used reaches for binary search at some level. It is the workhorse algorithm of computing — older than computers themselves; the bisection method for finding roots was used by 17th-century actuaries and by Babylonian astronomers before that. The simulator above lets you watch it run on a small array; in the next sections we trace the numbers up to a billion records and walk through the famously hard implementation details that bit production code in 2006.

EACH COMPARISON HALVES THE WINDOW · 30 STEPS FROM 1B → 1 step 0 · n = 10⁹ step 1 · n = 5×10⁸ step 2 · n = 2.5×10⁸ step 8 · n = ... step 25 · n = 32 step 28 · n = 4 step 30 · n = 1at any practical scale, the answer arrives in the time it takes to blink.

Origins — binary search predates programmable hardware

Binary search predates programmable hardware.

The earliest documented appearance in computing is John Mauchly's lecture How to Use the ORDVAC, given at the Moore School in August 1946 — the same Mauchly who had co-designed ENIAC two years earlier. Mauchly described a process he called “bisection” for looking up a function value stored on punched-card tables, halving the search interval at each comparison. The technique itself is far older: actuarial life-expectancy tables from the 17th century, library card-catalogue searches, and even the binary chops used in 4th-century Babylonian astronomy all employ the same halving strategy. Computing inherited a method that human computers had been using for centuries.

The first published correct generalised treatment came from D. H. Lehmer in 1960, whose ALGOL implementation handled the off-by-one and termination conditions that earlier programs had got wrong. Donald Knuth's The Art of Computer Programming, Volume 3: Sorting and Searching (Addison-Wesley, 1973) gave binary search the canonical treatment in section 6.2.1, including the recurrence C(n) = ⌊log2 n⌋ + 1 for the worst-case comparison count and the surprisingly subtle invariant proof for the standard implementation. Knuth attributes the algorithm to Mauchly's 1946 lecture and traces the off-by-one debate back to Berkeley UCD-CS programmers in the late 1940s.

Jon Bentley, in column 4 of Programming Pearls (Addison-Wesley, 1986), made the famous claim that “although the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962.” Bentley further reported that when he asked professional programmers in his classes to write a correct binary search from scratch, only about ten percent succeeded on the first attempt — a result that has been replicated informally many times since. The off-by-one cases are simple in retrospect (does hi point at the last valid index or one past it? does the loop terminate when lo > hi or lo ≥ hi?), but they trip up every working programmer at some point.

The most embarrassing modern episode came in 2006. Joshua Bloch, then at Google, published Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts Are Broken on the official Google research blog. The bug had lurked in Java's java.util.Arrays.binarySearch since at least JDK 1.2 (December 1998): the line int mid = (low + high) / 2 overflows for arrays larger than 230 elements because low + high exceeds Integer.MAX_VALUE and becomes negative. The corrected form is int mid = low + (high - low) / 2 or, in Java 7+, Integer.parseUnsignedInt with an unsigned shift (low + high) >>> 1. The bug shipped in production for nine years before anyone noticed; Bloch's piece is now standard reading for anyone who claims to know binary search.

The same overflow bug was independently rediscovered in OpenJDK's Arrays.sort, the GNU C++ standard library, several Python implementations of merge-sort, and at least three internal Google libraries that Bloch named in talks at OOPSLA. The pattern is recurrent because the textbook formula is the wrong default; the corrected form ought to be in every introductory algorithms book, but most still print the buggy version. Edsger W. Dijkstra argued in his 1972 essay The Humble Programmer that the difficulty of correct binary search reflects a deeper truth about programmer fallibility — small index arithmetic, off-by-one boundaries, and termination conditions are precisely where intuition slips and formal reasoning becomes load-bearing.


The log₂ n lower bound — a hard floor

Why log₂ n is a hard floor.

The textbook complexity is O(log n) comparisons, but the constant matters. Binary search performs ⌊log2 n⌋ + 1 comparisons in the worst case — for n = 1 million, that is 20; for 1 billion, 30; for the 18-quintillion entries that fit in a 64-bit address space, 64. The information-theoretic lower bound for finding one element among n equiprobable candidates by yes/no questions is ⌈log2 n⌉. Binary search hits that bound to within one comparison, so it is essentially optimal among comparison-based algorithms on sorted data.

Wall-clock numbers translate that bound through the memory hierarchy. On a modern Intel x86 CPU at 3.5 GHz with 4-cycle L1 latency, 12-cycle L2, 40-cycle L3, and 80–100 ns DRAM, a binary search over a 1 MB array fitting in L2 takes roughly 50 ns; over a 1 GB array spilling into DRAM, the same 30 comparisons cost 30 × 80 ns = 2.4 µs because each step is a cache miss. The number of operations does not change; the number of cache misses does. This single fact dominates real-world binary-search performance and motivates every cache-aware variant.

The cache-oblivious bound, due to Frigo, Leiserson, Prokop and Ramachandran in their 1999 FOCS paper, gives a layout-independent O(logB n) read cost where B is the cache-line size in elements. The B-tree is the canonical cache-aware analogue: by storing many keys per node it shifts the effective base of the logarithm from 2 to several hundred, cutting cache misses by a factor of log B. The B-tree's order, typically 200–500 for a 16 KB Postgres or InnoDB page, is precisely tuned so each comparison step touches one fresh cache line and resolves many comparisons within it.

The Eytzinger layout, named after the 16th-century genealogist Michel Eytzinger, lays a sorted array out in the same order as a complete binary heap: index i's children at 2i and 2i+1. The layout puts every step of a binary search on consecutive cache lines, eliminating the address-multiplication-and-jump that scatters traditional arr[mid] accesses. Khuong and Morin's 2017 Algorithmica paper Array Layouts for Comparison-Based Searching reports 2–3× speed-ups over the standard layout on cache-bound binary searches, and the Eytzinger trick has been quietly adopted in DPDK, ClickHouse, and the inner loops of several JIT compilers.

Dijkstra had a related point about correctness, made in his EWD-notes throughout the 1970s and pinned down in his 1976 book A Discipline of Programming: a binary search loop's invariant should be expressed as “the answer, if it exists, lies in [lo, hi]” with care taken about whether hi is inclusive or exclusive. Whichever convention you pick, you must use it consistently, and the loop termination condition follows directly. The recurrent off-by-one bug is almost always a half-open interval that the programmer treated as closed, or vice versa. Dijkstra's smoothing argument — pick the half-open convention because it composes — is what the C++ STL eventually settled on, and what makes std::lower_bound compose cleanly with iterators.

SORTED ARRAY vs EYTZINGER LAYOUT · same 7 keys, different ordersorted 1 [0] 3 [1] 5 [2] 7 [3] 9 [4] 11 [5] 13 [6]probes scatter: 7 → 3 → 5Eytzinger 7 [1] 3 [2] 11 [3] 1 [4] 5 [5] 9 [6] 13 [7]child(i) = 2i, 2i+1 — one cache line per probesame comparisons, fewer cache misses · Khuong & Morin (2017): 2–3× speedup on cache-bound searches.
The (low + high) / 2 overflow bug

In June 2006 Joshua Bloch posted on the Google Research blog that java.util.Arrays.binarySearch had carried this bug since 1997: int mid = (low + high) / 2 overflows when low + high exceeds 2³¹ on arrays larger than 1 GB. The fix is one character: int mid = low + (high - low) / 2. The same bug had been present in the original Bentley and McIlroy 1986 paper that taught a generation how to write binary search.


Four binary-search variants — find any, leftmost, rightmost, insertion point

Find any, leftmost, rightmost, insertion point.

The textbook “find any matching element” version is a starter; real APIs distinguish four variants and the off-by-one bug-surface lives entirely in their boundaries. lower_bound: the smallest index i where arr[i] ≥ target — the insertion point that preserves sortedness when ties are placed leftmost. upper_bound: the smallest i where arr[i] > target. leftmost match: lower_bound, then verify arr[i] == target. rightmost match: upper_bound − 1, then verify.

C++ has shipped all four in <algorithm> since the original 1998 standard: std::lower_bound, std::upper_bound, std::equal_range, std::binary_search. Python's bisect module (in the standard library since 1.4, January 1996) provides bisect_left and bisect_right. Rust's slice exposes binary_search and the more general partition_point. Java's Arrays.binarySearch picks one of the equal elements arbitrarily — documented and widely seen as a wart. Go added sort.Search in Go 1.0 (2012) which is the lower_bound flavour with a custom predicate.

Picking the wrong variant produces a class of bug that is uniquely insidious because it does not crash. A query that should return rows for created_at = '2024-05-01' instead returns rows for created_at ≤ '2024-05-01'; a percentile calculation includes one too few or one too many samples; a deduplication check declares the first item already-seen. The runtime is correct, the answer is wrong, the test that catches it has to involve duplicates and edge values.

The galloping or exponential search of Bentley and Yao (Information Processing Letters, 1976) is a useful relative for unbounded or large-but-mostly-empty intervals. Probe at indices 1, 2, 4, 8, 16, … until you exceed the target, then binary-search the last interval. The cost is O(log p) where p is the position of the answer — cheaper than log n when the target is near the start. Python's Timsort uses galloping in its merge step (since the original Tim Peters implementation in 2002) to skip past long runs cheaply. Java's List.indexOf for sorted lists, and the merge step in Apache Commons Collections, both reach for the same trick.

# bisect_left vs bisect_right on a sorted list with duplicates
arr = [1, 2, 2, 2, 3, 5]

bisect_left(arr, 2)  # → 1 (first index where 2 could go)
bisect_right(arr, 2) # → 4 (after the last 2)
bisect_left(arr, 4)  # → 5 (between 3 and 5)
bisect_right(arr, 4) # → 5 (same — no 4s present)

# Find a range of equal elements:
lo = bisect_left(arr, 2)
hi = bisect_right(arr, 2)
arr[lo:hi]           # → [2, 2, 2]

Branch prediction and SIMD — turning twenty cycles per step into five

Twenty cycles per step, made into five.

The textbook binary search has a data-dependent branch at the comparison — if (arr[mid] < target) — that defeats the CPU's branch predictor on random queries. Modern Intel and AMD predictors achieve roughly 95% accuracy on most code; on uniformly random binary-search comparisons, accuracy collapses toward 50% because each step's comparison is statistically independent of the last. Each mispredicted branch costs 15–20 cycles of pipeline flush; over 30 iterations on a billion-element array, that is several hundred wasted cycles per query.

The branchless formulation, popularised by Paul Khuong and others in the 2010s, replaces the conditional with a conditional move: compute both potential next indices, pick one with a CMOV-style instruction. The result has predictable instruction flow at the cost of always touching both halves of the comparison. Khuong's 2012 blog post Binary Search × Eytzinger Layout reports 1.5–2× speed-ups on random workloads; the technique now appears in ClickHouse's vectorised aggregations, Apache Arrow's filter kernels, and Rust's std::slice::binary_search_by hot path.

The cache-aware Bender–Farach-Colton framing, from their 2015 paper Cache-Oblivious B-Trees, generalises this further: for a given cache-line size B and array size n, the optimal layout interleaves elements from multiple levels of the implicit search tree to minimise cache-miss count. Empirical comparisons on Intel Skylake-X (Khuong, 2017 SODA) show the cache-aware layout outperforming naive sorted-array binary search by 3× on 1 GB inputs, and by an additional 2× when combined with a SIMD compare-and-mask step.

For very small arrays — under 64 elements — the right answer is often not binary search at all but a SIMD-parallel linear scan. AVX-512 can compare 16 32-bit integers in one cycle; an array of 64 elements fits in four AVX-512 vectors and resolves in roughly 6 cycles total, where binary search would take 6 iterations × 4 cycles per L1 hit = 24 cycles, plus branch-misprediction overhead. Postgres' B-tree node search has used SIMD-accelerated linear scan inside individual nodes since version 13 (September 2020); the gain over the standard _bt_binsrch is reported at 10–15% for OLTP workloads.

BRANCHY vs BRANCHLESS · same comparisons, different control flowbranchya[mid] < t ?lo=mid+1hi=mid~50% branch mispredict15–20 cycle penalty / stepbranchless · cmovcmp = (a[mid] < t)base = cmov(cmp, mid+1, base)no branch · always 4–5 cyclestouches both halves but predictableon random queries the predictor collapses to 50%; cmov pays for both branches up front and wins.
Algorithm Avg / worst Best for
Linear scann / ntiny arrays (< 64), SIMD
Binary searchlog n / log narbitrary sorted arrays
Interpolationlog log n / nuniformly-distributed numeric keys
Exponential / gallopinglog p / log nunbounded arrays, target near start
Eytzinger / cache-awarelog n / log nlarge arrays, prefetch-friendly layout
Learned indexlog err / log npredictable distributions, read-mostly

Binary search the answer — a more general framing

"Search the answer," a more general framing.

The structural prerequisite for binary search is not sortedness; it is monotonicity of a predicate. Given any function P(x) that is true on some prefix of an interval and false on the suffix, binary search finds the boundary in O(log range) evaluations. The framing “search the answer space, not the input” turns dozens of optimisation problems into binary searches on a numeric parameter.

Concrete production examples. Capacity planning: find the smallest cluster size that keeps p99 latency under the SLA on a load-test rig — binary-search on machine count, evaluate by running the test. Build bisection: git bisect is binary search on commit history, finding the regression commit in log2(commits) rebuilds. The Linux kernel project adopted this in 2007 and Linus Torvalds has cited it as one of git's most-used features. Floating-point square root on hardware that lacks FSQRT: bisect on x × x against the target until convergence within a tolerance. Database query planning: optimisers binary-search on the cardinality estimate cutoff that switches between hash and merge join.

The interpolation search variant, due to Peterson (1957) and analysed by Yao & Yao (1976), uses the value distribution to guess the next probe rather than always picking the midpoint: probe at lo + (target − arr[lo]) / (arr[hi] − arr[lo]) · (hi − lo). For uniformly distributed keys, the expected complexity is O(log log n) — for a billion uniform keys, roughly five probes instead of thirty. The catch: on adversarial or skewed distributions, performance degrades to O(n). SQL Server's columnstore index uses a hybrid that starts with interpolation and falls back to binary search if the first guess overshoots badly, capturing the win on uniform data without the worst-case risk.

The fractional cascading trick of Chazelle and Guibas (1986) speeds up sequences of binary searches across a hierarchy of sorted lists from O(k log n) to O(log n + k) by threading sentinel pointers between levels. It rarely shows up in line-of-business code but is the reason geometric range queries in computational-geometry libraries (CGAL, Boost.Geometry) can hit the bounds they advertise.


Where binary search hides in production

It is hiding in everything you call.

Binary search rarely shows up at the top of a profile because it sits inside primitives that are themselves the named cost. Inside every B-tree node: locating the right child to descend into means binary-searching the keys in the node. Postgres' _bt_binsrch, MySQL InnoDB's page_cur_search_with_match, SQLite's moveToChild, Oracle's index probe — all are binary searches over the 200–500 keys per page. With four levels and 300 keys per node, a 1-billion-row table costs four cache-friendly binary searches per lookup.

Inside hash table chains: when a hash bucket overflows, modern implementations (Java's HashMap since JDK 8, Rust's HashMap, Go's map) escalate to a binary tree on the chain; the lookup within the chain is then a binary search of the tree-ordered keys. Inside JIT compilers: LLVM's switch-statement lowering reaches for binary-search dispatch when a jump table would be too sparse and a linear scan too slow. Inside operating-system schedulers: the Linux Completely Fair Scheduler (CFS) used a red-black tree of runnable tasks ordered by virtual runtime; finding the leftmost task is an O(log n) traversal that is morally a binary search.

Inside Timsort (Tim Peters, Python 2.3, 2002): the merge step uses galloping binary search to skip past long runs of one input. CPython's listobject.c implements gallop_left and gallop_right as exponential probes terminating in a binary search, exactly Bentley–Yao 1976. The same algorithm shipped in OpenJDK 7 (2011) for Arrays.sort(Object[]) and was the source of the famous 2015 Stijn de Gouw, Jurriaan Rot et al. correctness bug — the team formally verified Timsort with KeY and discovered an integer-overflow case in the merge-state stack-size argument. Timsort had been considered correct for thirteen years; it took theorem-proving to find the corner case.

Inside numerical analysis: Brent's method, the bisection method for root-finding, golden-section search for univariate optimisation — all reduce to binary search on a continuous interval with a tolerance instead of an exact target. SciPy's scipy.optimize.brentq, GSL's gsl_root_fsolver_bisection, MATLAB's fzero all ship hardened versions. Inside coding theory: error-correcting code decoders use binary search on the syndrome polynomial. Anywhere data is sorted — or can be made monotonic — binary search is what runs.

Even non-obvious code paths reduce to binary search. Linux kernel scheduler bookkeeping, Postgres planner's selectivity estimation, the Lucene segment readers behind Elasticsearch, V8's bytecode handler dispatch — all keep sorted lookup tables that they probe with the same logarithmic procedure that Mauchly described in 1946. The algorithm is foundational; on most production servers it runs millions of times per second, almost always under the name of some other primitive.


Adjacent techniques — hashing, tries, linear scan, learned indexes

Hashing, tries, linear scan, learned indexes.

Binary search wins when the data is sorted, the access pattern is point-or-range lookup, and the working set is too large for either a hash table (which costs more memory and gives up ordering) or a linear scan (which costs n instead of log n). When any of those conditions slips, a different primitive fits better.

For exact-match-only lookups with a fixed key set, a hash table is faster — O(1) amortised vs. O(log n) — and the constant is smaller because there is no comparison loop. C++ unordered_map, Java HashMap, Rust HashMap, Go map, Python dict all settle here. The cost is no ordering: range queries, predecessor/successor, and ordered iteration are O(n).

For string keys with prefix structure, a trie (Fredkin, CACM 1960) gives lookups in time proportional to key length rather than dataset size, and naturally supports prefix queries. Production tries appear in IP-route lookup tables (Linux kernel's FIB trie), spelling correctors, and the suffix-array structures inside grep and ripgrep. Compressed variants like the radix tree (Patricia trie) cut memory; PostgreSQL 15+ uses a radix tree internally for the RUM extension's posting lists.

For tiny arrays, a SIMD-friendly linear scan beats binary search. The crossover, on Intel x86 with AVX2, is roughly 32–64 elements; below that, binary search's branch overhead dominates the cost of touching every element. Hash-map implementations exploit this by switching from open-addressing probe to linear scan within a small bucket.

For predictable distributions, the 2018 paper The Case for Learned Index Structures (Kraska, Beutel, Chi, Dean, Polyzotis, SIGMOD 2018) argues a small neural network or piecewise-linear regression can predict the position of a key directly, often beating binary search on uniform real-world data. Google's RecSplit, MIT's RMI, and the literature on FITing-Tree have refined the idea. The catch is the same as interpolation search: pathological distributions degrade gracefully only if the implementation falls back to binary search; learned indexes are a probabilistic accelerant, not a replacement, for the canonical algorithm.

For massive read-mostly key-value stores, a sorted-string-table-with-Bloom-filter front (the standard LSM-tree pattern in RocksDB, Cassandra, ScyllaDB) elides binary search on the negative path entirely: the Bloom filter answers “definitely not present” in tens of nanoseconds, and only the positive path triggers an actual binary search inside an SSTable index block. The composite of Bloom filter plus binary search is what powers most modern OLTP-on-LSM systems and is faster than naive binary search on disk by orders of magnitude on read-skew workloads.


Further reading on binary search

Primary sources, in order.

Found this useful?