What is Big-O notation?

Big-O notation describes the asymptotic upper bound of a function — how its growth scales as input size grows. f(n) = O(g(n)) means f grows at most as fast as g for large n, ignoring constants. Common rates: O(1) constant, O(log n) logarithmic, O(n) linear, O(n log n) quasilinear (the floor for comparison sorting), O(n²) quadratic, O(2ⁿ) exponential. Used to compare algorithms independently of hardware and constant factors.

Algorithm Visualizer, step by step.

Run a classic algorithm and read its trace one operation at a time. BFS expands by layers, DFS dives deep, Fibonacci unfolds, Kadane tracks a running maximum. Pick one and watch how the work actually happens.

Algorithm

bfs

Trace

— click Run —

What you're looking at

Four small algorithms behind four buttons. Pick one, press Run, and the panel prints the exact trace the code produced: the order BFS or DFS visits nodes A through G on the same seven-node graph, the first twelve Fibonacci numbers, or Kadane's running cur/max values over the array [-2, 1, -3, 4, -1, 2, 1, -5, 4]. The card on the right names the complexity class and what each algorithm is actually used for.

Run bfs, then dfs, and compare the two traces. Same graph, same start node, same seven visits — but C, a direct neighbour of A, is third in the BFS trace and dead last in the DFS one, because DFS commits to B's entire subtree before backtracking. Then try kadane and watch the line at i=3: cur jumps from -2 to 4. The algorithm just threw away everything it had accumulated, and that single "extend or start fresh" decision is the whole trick behind the O(n) maximum-subarray solution.

Why algorithm choice matters — same problem, very different fates

Two algorithms, very different fates.

An algorithm is a finite sequence of steps that solves a class of problems; choosing the right one is the dominant lever for performance, dwarfing micro-optimisation. Big-O notation (Bachmann 1894, Landau 1909) gives the language for comparing them. The simulator above lets you race classical algorithms — BFS vs DFS, Fibonacci recursive vs memoised, Kadane's algorithm — and feel the curves yourself.

Imagine a programmer with a list of one million records to sort. They write the simplest version they can think of — bubble sort, six lines, easy to read. They run it on their laptop. The program starts and never finishes. After ten minutes, they cancel it. Bubble sort is O(n²), so the work for one million elements is roughly one trillion comparisons; even at a billion comparisons per second, that is sixteen minutes of pure CPU. Frustrated, they replace it with the standard library sort, which is O(n log n). The same input finishes in 0.4 seconds. Same machine, same data, same correct answer — three thousand times faster, because the dominant operation count went from n² to n log n.

This is the problem big-O notation exists to make legible. Two algorithms that look similar on small inputs can have wildly different fates as the input grows. Big-O captures the rate of that growth — how fast the work scales with the size of the problem — and lets you predict, before writing the code, whether your idea will finish in milliseconds or never. The constants don't matter to the asymptotic; the ratio between curves does, and it grows without bound. An O(n) algorithm beats an O(n log n) by a factor of log n, which is small. An O(n log n) beats an O(n²) by a factor of n / log n, which gets vast fast. An O(n²) beats an O(2ⁿ) by a factor that nobody bothers to measure because the latter is already infeasible.

Some anchoring numbers. With n = 1,000, those curves count roughly 1, 10, 1,000, 10,000, 1,000,000, and 10³⁰¹ operations respectively. With n = 1,000,000, they count 1, 20, 1,000,000, 20,000,000, 1,000,000,000,000, and a number with no useful name. Pick the wrong curve and your program turns into a heat-source for the universe; pick the right one and the same problem fits comfortably on a phone. The simulator above runs the same algorithm at adjustable input sizes; switch to a quadratic version and the bar chart for steps grows visibly faster than the linear version next to it.

Big-O is also a vocabulary, not just a number. When two engineers say "this is the bottleneck", they're often pointing at different operations on different curves; agreeing on which curve dominates is half of resolving the disagreement. The same word also covers space — an algorithm can be O(n) in time and O(n²) in memory, or vice versa, and you can sometimes trade one for the other. The rest of this article walks through the canonical curves, the families of algorithm techniques (divide-and-conquer, dynamic programming, greedy, graph search, backtracking, randomised), and the practitioners' rules for when the asymptotic stops mattering and the constants take over.

Big-O notation — fifty years to settle

A notation that took fifty years to settle.

The asymptotic notation modern programmers take for granted — O(n log n), Θ(n), Ω(n²) — arrived in steps over almost a century. Paul Bachmann's 1894 textbook Die Analytische Zahlentheorie introduced the upper-bound symbol O for analytic number theory; Edmund Landau's 1909 Handbuch der Lehre von der Verteilung der Primzahlen popularised it among mathematicians, which is why "Landau notation" persists as a synonym in European literature. Both men used it to talk about how prime-counting functions misbehave near infinity, not how computer programs run; the migration into computing took another sixty years.

The bridge was Donald Knuth. His 1976 SIGACT News article “Big Omicron and Big Omega and Big Theta” made a public, quiet plea that algorithms researchers stop misusing O — the original was a strict upper bound, but practitioners had been using it loosely to mean “tight bound” — and adopt the lower bound Ω and the tight bound Θ as separate symbols. The article reads like a piece of soft-spoken etiquette enforcement; it worked, and within a decade the textbook conventions of The Art of Computer Programming had become universal. Knuth's TAOCP volumes themselves — Volume 1 (Fundamental Algorithms, 1968), Volume 2 (Seminumerical, 1969), Volume 3 (Sorting and Searching, 1973), Volume 4A (Combinatorial, 2011) — remain the most thorough reference for low-level algorithm analysis ever written.

The other foundational moment came in 1971. Stephen Cook's “The complexity of theorem-proving procedures” introduced what we now call NP-completeness via the Boolean satisfiability problem; Leonid Levin, working independently in the Soviet Union, published a parallel result in 1973. Richard Karp's 1972 paper “Reducibility among combinatorial problems” turned the framework into a working tool by reducing 21 well-known problems to SAT, including travelling salesman, maximum clique, and integer programming. The P vs NP question — whether every problem with a polynomial-time verifier also has a polynomial-time solver — remains the central open question of theoretical computer science fifty-three years later, and one of the seven Clay Mathematics Institute Millennium Prize problems.

The framework that ties it all together is the RAM model of computation: an abstract machine with O(1) memory access, fixed-width words, and constant-time arithmetic. Real computers violate this model in ways that matter for performance — cache hierarchies, NUMA, branch prediction, SIMD — which is why measured constants diverge from theoretical big-O bounds. The model is still the right one for asymptotic analysis; it is the wrong one for choosing between two implementations that share a complexity class. Both observations matter, and the discipline that comes from holding them in tension at once is the working programmer's most useful intuition.

Complexity curves — log, linear, n log n, quadratic

Curves you can feel.

Big-O is the rate at which an algorithm's work grows with its input. The notation hides the constants — but the constants matter. Two algorithms that are both O(n log n) can differ by five times in practice; on modern hardware, an O(n) algorithm with bad cache behaviour can lose to an O(n log n) algorithm with friendly access patterns. The curves are the rough shape; the constants are the cost; the cache hierarchy is the tiebreaker.

The table below counts operations, not nanoseconds. To translate, multiply by the cost of one operation in the dominant working set. A pointer chase that misses L3 cache costs roughly 100 ns on a contemporary CPU; a branch-mispredicted comparison costs 5–10 ns; a branch-predicted register-only comparison costs under 1 ns. An O(n) algorithm doing a million pointer chases out of cache spends 100 ms; the same algorithm doing a million register comparisons spends a millisecond.

Curve	n=10	n=1k	n=1M	Where it lives
O(1)	1	1	1	Hashmap lookup, array index.
O(log n)	3	10	20	Binary search, B-tree probe.
O(n)	10	1k	1M	Array scan, linked-list walk.
O(n log n)	33	10k	20M	Comparison sort. The "good enough" curve.
O(n²)	100	1M	10¹²	Bubble/insertion. Fine for n<100; lethal otherwise.
O(2ⁿ)	1k	10³⁰¹	∞	Naive recursion. Memoise or perish.

Constants matter most for small n. Insertion sort beats quicksort for arrays under about sixteen elements — which is why every fast quicksort implementation (introsort in libstdc++, pdqsort in Rust's standard library, the Timsort base case in CPython) switches to insertion sort below a tunable threshold. The CLRS textbook proves the asymptotic preference in chapter 7; the Linux kernel's lib/sort.c proves the constant preference in production by hard-coding the threshold at 8.

Robert Sedgewick and Kevin Wayne's Algorithms (4th edition, Addison-Wesley, 2011) documents constant factors empirically for the standard sorts. On their reference machine quicksort runs at roughly 1.4 ns per compare-swap, mergesort at 1.6 ns, heapsort at 2.1 ns — the difference is cache behaviour, not asymptotics. The same observation underpins Brodal and Moruz's I/O-efficient priority queues (2005), the cache-oblivious B-tree (Bender, Demaine, Farach-Colton, FOCS 2000), and most of what makes contemporary algorithm engineering distinct from classical algorithm design.

The frontier of asymptotic analysis itself is more alive than most working programmers realise. The 2014 result by Francois Le Gall improved matrix multiplication from Strassen's 1969 O(n^2.807) to O(n^2.3729); a 2024 paper by Duan, Wu, and Zhou pushed it to roughly O(n^2.371552), all without ever shipping a faster matrix-multiplication routine because the galactic constants make those algorithms slower than Strassen for any matrix that fits in a data centre. The standard term for this phenomenon, due to Richard Lipton in 2010, is a galactic algorithm: provably better in the limit, useless on any input a person has ever computed. The practical takeaway is the inverse: an O(n^3) blocked, vectorised, GPU-friendly matrix multiply (cuBLAS, OpenBLAS, Intel MKL) beats the theoretically faster algorithms for every input that humans actually care about. The asymptotic improvement is real; the wall-clock improvement is zero.

Common algorithmic moves — divide-and-conquer, two pointers, DP

A short field guide to moves.

Most algorithms fall into one of six families. Knowing the family is half the work — once you place a problem, the canonical solutions and their gotchas are usually one search away. The taxonomy below is the one the CLRS textbook (Cormen, Leiserson, Rivest, Stein; Introduction to Algorithms, 4th edition, MIT Press 2022) organises its chapters around, with one extension for randomised techniques that the Knuth tradition treats as a separate axis.

Branch and bound deserves a mention as a seventh family even when it doesn't get its own chapter. It is the technique behind every working SAT solver, every mixed-integer linear programming solver (CPLEX, Gurobi, COIN-OR's CBC), and most production combinatorial optimisers. Lower bounds prune branches before exploring them; the algorithm is intractable in the worst case but routinely solves instances with millions of variables in practice. Dorit Hochbaum's Approximation Algorithms for NP-Hard Problems (PWS, 1997) is the standard reference when an exact answer is out of reach.

DIVIDE & CONQUER
Split, recurse, combine. Mergesort, quicksort, FFT, closest-pair-of-points. The shape: T(n) = 2·T(n/2) + O(n) → O(n log n).
DYNAMIC PROGRAMMING
Cache subproblem answers; rebuild larger solutions from cached pieces. Knapsack, edit distance, longest common subsequence, Bellman-Ford.
GREEDY
Always pick the locally best option. Works only when "locally best" composes into "globally best". Huffman coding, Dijkstra, Kruskal/Prim, interval scheduling.
GRAPH SEARCH
BFS for shortest unweighted path; DFS for connectivity, topological sort, cycle detection; A* for weighted with heuristic; Dijkstra for non-negative weights.
BACKTRACKING
Try, fail, undo, try the next. N-queens, sudoku, SAT solvers. Constraint propagation cuts the search space by orders of magnitude.
RANDOMISED
Use randomness to dodge worst-case inputs. Skip lists, randomised quicksort pivot, hash functions, MinHash. Average-case rather than worst-case bounds.

Performance analysis — find the dominant operation

Find the dominant operation.

Most code reviews include a “what's the complexity” question. The trick is identifying the dominant operation — the inner loop or recursion that scales with input size — and counting how it grows. The patterns below cover roughly ninety percent of code that ever ships.

The other ten percent are recursive divide-and-conquer with non-trivial recurrence relations. The Master Theorem, in the form most working programmers know it, was articulated by Jon Bentley, Dorothea Haken, and James Saxe in their 1980 paper “A general method for solving divide-and-conquer recurrences” (SIGACT News). The recurrence T(n) = a·T(n/b) + f(n) splits into three cases depending on whether f(n) grows polynomially slower, comparably, or faster than n^(log_b a). For mergesort, a=2, b=2, f(n)=O(n), and the polynomial term wins by a logarithmic factor, giving O(n log n). For binary search, a=1, b=2, f(n)=O(1), giving O(log n). For the naive Strassen-style matrix multiplication, a=7, b=2, f(n)=O(n²), giving O(n^log_2 7) which is roughly O(n^2.807). The theorem is the right tool for fifteen seconds of analysis on any divide-and-conquer body.

Amortised analysis handles the case where occasional expensive operations are paid for by many cheap ones. Robert Tarjan's 1985 paper “Amortized computational complexity” (SIAM Journal on Algebraic and Discrete Methods) introduced the three standard techniques: aggregate analysis (total cost over n operations divided by n), the accounting method (assign virtual cost to each operation, save the surplus in a bank), and the potential-function method (define a non-negative function of the data structure's state and pay the change in potential as part of each operation's amortised cost). The classical example is the dynamic array: each push is O(1) amortised even though occasional resizes are O(n), because the resize doubles capacity and pre-pays for n future O(1) pushes.

Randomised analysis is the fourth technique. Karger's 1993 minimum-cut algorithm runs in O(n² log³ n) expected time on any graph; Mulmuley's randomised geometric algorithms (1992–1994) produce expected-O(n log n) hull and Voronoi diagram computations regardless of input distribution. The trick is that the algorithm flips coins at runtime; the expected cost is taken over the algorithm's coin flips, not over a distribution of inputs. The worst-case input no longer exists because the adversary can't predict the algorithm's choices. Cuckoo hashing, treaps, skip lists, and randomised quicksort all share this DNA.

# Master theorem · pseudocode for the case discrimination
function masterTheorem(a, b, f):
    critical = n ** log(a, b)             # n^log_b a

    if f(n) is O(critical / n^ε) for ε > 0:    # leaves dominate
        return Θ(critical)

    elif f(n) is Θ(critical):                  # balanced
        return Θ(critical · log n)

    elif f(n) is Ω(critical · n^ε)             # root dominates
         and a · f(n/b) ≤ c · f(n) for c < 1:  #   (regularity)
        return Θ(f(n))

    else:
        # falls outside the master theorem; use Akra-Bazzi
        return akraBazzi(a, b, f)

Online, approximate, and parameterised algorithms

Online, approximate, parameterised.

Worst-case analysis is the default frame, but four other frames matter for production work. Online algorithms make decisions on a stream without seeing the future; their quality is measured by competitive ratio, the worst-case ratio between the online algorithm's cost and the optimal offline solution. Daniel Sleator and Robert Tarjan introduced the framework in “Amortized efficiency of list update and paging rules” (CACM, 1985), which proved that LRU is 2-competitive against the optimal offline page-replacement algorithm. Every cache eviction policy in production is implicitly an online algorithm against a future the cache can't see.

Approximation algorithms trade a guarantee of polynomial running time for a bounded loss in solution quality on NP-hard problems. The 2-approximation for vertex cover (any maximal matching), the (1+ε)-approximation schemes for the Euclidean travelling salesman (Sanjeev Arora, 1996), and the LP-rounding 0.878-approximation for max-cut (Goemans and Williamson, 1995) are the canonical examples. Dorit Hochbaum's edited volume Approximation Algorithms for NP-Hard Problems (PWS, 1997) and Vijay Vazirani's Approximation Algorithms (Springer, 2003) are the field references. In practice, modern MIP solvers like Gurobi and CPLEX combine branch-and-bound with cutting-plane methods to solve instances that would have been intractable two decades ago, blurring the line between exact and approximate.

Parameterised complexity (Rod Downey and Michael Fellows, Parameterized Complexity, Springer 1999) classifies problems by an input parameter k separate from the input size n. A problem is fixed-parameter tractable (FPT) if it can be solved in time f(k)·poly(n) for some computable function f. Vertex cover is FPT in k (the size of the desired cover); travelling salesman is FPT in treewidth; many problems that look hopeless under classical analysis become tractable when the relevant parameter is small. The 2010s SAT solvers (CaDiCaL, Glucose, MiniSAT) implicitly exploit FPT structure when they solve instances with millions of clauses but small backdoor sets.

Average-case analysis is the fourth frame, and the one with the messiest reputation. Quicksort's worst case is O(n²); its expected case under uniform random inputs is O(n log n); the practical observation is that real-world inputs are neither uniformly random nor adversarial, and randomised pivoting bridges the gap. The smoothed-analysis framework (Daniel Spielman and Shang-Hua Teng, “Smoothed analysis of algorithms”, JACM 2004; Gödel Prize 2008) measures performance on inputs perturbed by small amounts of noise, capturing what algorithms actually face. The simplex method for linear programming has O(2^n) worst case but smoothed-polynomial running time, which is why it dominates production LP solvers despite the existence of polynomial-time interior-point alternatives.

Choosing an algorithm in production

Choosing in production.

The instinct to reach for “the fastest algorithm” is usually wrong. Real choice depends on input shape, stability requirements, memory budget, and the cost of a bug. The five rules of thumb below hold up across decades of production deployment.

For small input (n < 50), pick whatever is clearest. Insertion sort beats quicksort. Linear search beats binary search if you have to write the comparator. The Linux kernel's lib/sort.c uses heapsort because the constant overhead of recursion is unacceptable in interrupt context, not because heapsort is theoretically better. Clarity at small n is worth more than asymptotic efficiency.

For nearly-sorted input, Timsort wins. Tim Peters wrote it in 2002 for CPython 2.3; Java's Arrays.sort for object arrays adopted it in JDK 7 (2011); V8's Array.prototype.sort switched to TimSort in 2018. It detects existing monotone runs in the input and merges them, achieving O(n) on already-sorted data and never worse than O(n log n) on the worst case. The Rust standard library's slice::sort uses a related algorithm called pattern-defeating quicksort (pdqsort, Orson Peters, 2014) that combines branchless partitioning with run detection.

For stable sorting (preserving the relative order of equal keys), the choices narrow to mergesort, Timsort, insertion sort, and the bucket-style radix variants. Quicksort and heapsort are not stable. The C++ standard library's std::sort is unstable; std::stable_sort is the stable variant and is typically a Timsort or Tukey-style mergesort.

For memory-constrained sorting, heapsort is the only standard in-place O(n log n) sort. Mergesort needs O(n) auxiliary space; quicksort needs O(log n) for recursion in the best case and O(n) in the worst. For external sort — data that exceeds RAM — the standard recipe is multi-way mergesort with k-way merge over disk-resident sorted runs, which has been the textbook approach since Knuth TAOCP Volume 3 in 1973. Modern columnar databases (Snowflake, BigQuery, Spark) use this technique on terabyte sorts.

For adversarial inputs, randomised pivoting is the standard mitigation. Without it, an attacker who controls the input can drive quicksort to O(n²). The 2003 paper by Crosby and Wallach (USENIX Security) on hash-collision attacks against linguistic frameworks (Perl, Python) showed how the same observation applies to hash tables: adversarial keys with colliding hashes turn O(1) lookups into O(n). Modern hash tables (CPython 3.4+, Rust's HashMap, Go's map) randomise the hash seed at process start to prevent this attack.

The most important rule is meta: profile before you optimise. The fastest algorithm in a hot path that runs once a day saves nothing. The slowest in a request handler dominates everything. Brendan Gregg's Systems Performance (Pearson, 2nd edition 2020) and the perf tool on Linux are the right starting points. Algorithm choice is one column in the spreadsheet of optimisation; data layout, branch prediction, and memory hierarchy are the other three.

Galactic algorithms

An algorithm with a better asymptotic bound but constants so large the crossover happens beyond any input you'll ever process. Strassen-style matrix multiply at O(n^2.371) is faster than Strassen's O(n^2.807) — for matrices larger than the observable universe. The asymptotic improvement is real; the wall-clock improvement is zero.

Watching the algorithm think.

Algorithm visualisation as pedagogy has a longer pedigree than people remember. Donald Knuth's Stanford GraphBase (1993) shipped with a suite of programs that drew the algorithms in TAOCP as they ran, on the grounds that watching a sort or a graph traversal in motion changed how students understood it. Robert Sedgewick's filmed Algorithms lectures at Princeton (later available on Coursera as the Algorithms, Part I/II MOOC starting in 2012) animated every algorithm in the textbook with the same instinct. Bret Victor's 2012 talk “Inventing on Principle” argued that programmers should never be working blind, and that explicit visual feedback was the missing ingredient in most pedagogical tools.

The web-era inheritors are VisuAlgo (Steven Halim and the National University of Singapore, started 2011, expanded continuously through 2024), David Galles's animation collections at the University of San Francisco, and a long tail of single-purpose visualisations like Mike Bostock's D3.js notebook on quicksort. The pattern that works is a single algorithm, a single small input, controllable speed, and a way to step backward through history. The pattern that doesn't work is a kitchen-sink visualisation that covers fifty algorithms with surface-level animation and no narrative.

Empirically, comprehension benefits from visualisation are largest when learners are active — constructing inputs, predicting next steps, comparing two algorithms on the same data — and smallest when they passively watch. Christopher Hundhausen, Sarah Douglas, and John Stasko's 2002 meta-analysis “A meta-study of algorithm visualization effectiveness” (Journal of Visual Languages and Computing) summarised twenty-four controlled studies and found a moderate effect size for active engagement and essentially zero for passive watching. The takeaway for tools like this simulator: the visualisation matters less than the prompts to predict, modify, and compare.

Once you've internalised an algorithm visually, the next step is reading the source. CPython's Objects/listobject.c contains the canonical Timsort implementation; the Linux kernel's lib/list_sort.c contains a beautiful merge sort over linked lists; Postgres's src/backend/access/nbtree contains a production B+tree with concurrency control. The leap from textbook pseudocode to production C is where the constants (and the bugs) live.