What is recursion in programming?

Recursion is when a function calls itself on a smaller subproblem until it reaches a base case. The classic example: factorial(n) = n × factorial(n-1), with factorial(1) = 1 as the base case. Recursion expresses naturally self-similar problems (tree traversal, divide-and-conquer, parsers) but can be expensive without memoisation when subproblems overlap.

What is the difference between memoisation and tabulation?

Both eliminate redundant recursive computation. Memoisation is top-down: the recursive function checks a cache before computing and stores its result after — same recursion, different caching. Tabulation is bottom-up: an iterative loop fills a table from the base cases up, no recursion at all. Memoisation is easier to retrofit; tabulation typically wins on memory and constant factors, especially when the table can be reduced to one or two rows.

New to this? · ELI5 · 1 min Read Recursion explained simply, in plain English

Recursion Simulator: step by step.

Watch the call stack grow and shrink. See the recursion tree expand. Switch on memoisation and watch the same tree collapse. Flip to tabulation and the recursion disappears entirely. The same problem, three lenses — that's what dynamic programming is.

Calls

Max stack

Time

O(2ⁿ)

Result

Problem

Strategy

n = 6 (max 8)

event 0/50

Call stack (grows down)

— empty —

depth · 0

Recursion tree

active call returned scroll to pan

TimeO(2ⁿ)

SpaceO(n) stack

NoteExponential blow-up; same subproblem solved many times.

What you're looking at

Three controls drive everything: the problem (Fibonacci, factorial, climbing stairs), the strategy (naive recursion, top-down memo, bottom-up tabulation), and the n slider. The playback row steps an event at a time through the trace. In recursion modes the left column is the live call stack growing downward, the right is the recursion tree filling in, with each node showing its return value once it resolves. In tabulation mode the tree disappears and you watch a DP table fill cell by cell. The counters up top track total calls, max stack depth, and the resulting time complexity.

Run Fibonacci naive at n=8 and count the calls: the tree balloons because fib(5) and smaller get recomputed over and over. Now switch to top-down memo without changing n. The same tree appears, but repeat subproblems light up as cache hits and stop expanding, and the call count collapses from exponential toward linear. The surprise is that the answer never changed, only the shape of the work. Flip to bottom-up and recursion vanishes entirely: the same numbers, built once each, with no stack at all.

What is recursion?

When the easiest answer involves a smaller version of itself.

Recursion is a programming technique where a function solves a problem by calling itself on a smaller subproblem. Alonzo Church's lambda calculus (1930s) and John McCarthy's Lisp (1958) made it foundational to computer science. Naive recursion can be exponentially wasteful; memoisation caches results, and tabulation rewrites it iteratively — both are the heart of dynamic programming.

Imagine you are asked: how many ways can a person climb a staircase of ten steps if they can take either one or two steps at a time? You might begin by trying every combination by hand, and quickly notice the cases multiply uncomfortably. Ten steps yields eighty-nine ways. Twenty steps yields almost eleven thousand. Forty steps gives a hundred and sixty-five million. The brute search is finite, but exhausting any pattern will run out of paper before it runs out of cases.

Now look at the problem differently. The very last step the climber takes is either of size one or of size two. If it was a single step, then to get there they had to climb the first nine steps in some way. If it was a two-step jump, they had to climb the first eight. So the number of ways to climb ten steps equals the number of ways to climb nine plus the number of ways to climb eight. The big problem becomes two smaller copies of itself. Apply the same reasoning to nine steps, and to eight, and so on, until you bottom out at zero or one steps where the answer is obviously one. You have not added any cleverness, just noticed that the problem is self-similar.

A function that calls itself with smaller inputs is called a recursive function. It is the natural notation for self-similar problems. Two pieces are required: a base case that returns directly without recursing, so the chain bottoms out, and a recursive case that breaks the input down and combines the results of the smaller calls. The simulator above lets you switch between three small problems — Fibonacci, factorial, climbing stairs — and watch the call stack grow as the function calls itself, then unwind as each call returns its answer.

The catch is that the unwinding is rarely free. A naive recursive solver for Fibonacci ends up recomputing fib(2) dozens of times because it appears in many subtrees. fib(40) launches roughly 330 million calls; fib(50) launches forty billion. The work isn't hard per call — it is that the same easy work is being repeated by an exponentially growing army of calls. Any time the same subproblem turns up more than once, you have an opportunity to remember the answer the first time you computed it. That is what dynamic programming is, and the second mode of the simulator (top-down memoisation) shows it in action: the redundant subtrees collapse into single cache hits, and an exponential algorithm becomes a linear one. The third mode (bottom-up tabulation) goes further and replaces the recursion entirely with a loop over a small table.

Concrete numbers anchor the wins. Naive Fibonacci on n = 40 takes roughly three seconds on a modern laptop. Memoised Fibonacci on the same n = 40 finishes in microseconds — a six-orders-of-magnitude difference for adding three lines of code. The price is a small dictionary and a tolerance for the recursion's stack depth. The tabulated version drops the stack entirely and computes fib(1000000) in a few milliseconds with no risk of overflow. Same answer, three different costs; the simulator lets you toggle between them on identical inputs.

How recursion works — a function that calls itself

A function that calls itself.

A recursive function answers a question by reducing it to a smaller version of the same question — and answering that. Two pieces are required: a base case that returns directly without recursing (so the chain bottoms out), and a recursive case that reduces the input and combines the results of the smaller calls.

Take Fibonacci. The base case says fib(0) = 0 and fib(1) = 1 — small enough that no further work is needed. The recursive case says fib(n) = fib(n-1) + fib(n-2) — same problem, smaller inputs. Each call sets up a stack frame holding its parameters and local variables; when the function returns, the frame is popped. Stack depth at any moment equals how deep the current call chain has gone, not how many calls have been made in total.

function fib(n) {
  if (n < 2) return n;            // base case
  return fib(n - 1) + fib(n - 2); // recursive case
}

Watch the simulator above with Naive + Fibonacci + n = 6. As you step through, the stack on the left grows down with every call and shrinks up with every return. The tree on the right shows the same calls as a graph: each node is a call, each edge is “this call made that call”. Every leaf is a base case; every interior node sums the values of its two children.

Three structural patterns dominate. Linear recursion calls itself once per invocation — factorial, list traversal, summing an array. Tree recursion calls itself two or more times — Fibonacci, climbing stairs, generating subsets. Mutual recursion involves two or more functions that call each other — the classic even/odd definition, recursive-descent parsers with separate functions per grammar rule. Each pattern produces a different shape in the call tree, and the shape predicts the asymptotic behaviour: linear gives O(n) calls, balanced binary tree recursion gives O(2ⁿ), divide-and-conquer with O(log n) depth gives O(n log n) total work.

A recurrence relation captures the cost in closed form. Mergesort's running time satisfies T(n) = 2T(n/2) + O(n), which the Master Theorem (CLRS chapter 4) resolves to O(n log n). Naive Fibonacci satisfies T(n) = T(n−1) + T(n−2) + O(1), growing as Φⁿ where Φ ≈ 1.618 is the golden ratio. Solving recurrences is a small algebraic art form — the unrolling method, the recursion tree, the Master Theorem, the Akra–Bazzi generalisation — that translates a recursive description into a wall-clock prediction.

Origins — lambda calculus, induction, and Lisp

Lambda calculus, induction, and Lisp.

Recursion is older than the digital computer. Alonzo Church introduced the lambda calculus in a series of papers between 1932 and 1936 (Princeton; The Calculi of Lambda-Conversion, Annals of Mathematics Studies, 1941) as a formal system for expressing computation through anonymous function application. His student Stephen Kleene developed the theory of partial recursive functions in 1936, formalising what it meant for a function on the natural numbers to be computable. Around the same time Kurt Gödel had defined primitive recursive functions in his 1931 incompleteness paper, building all of arithmetic from successor, projection, and a recursion operator. By 1936 Church, Kleene, and Alan Turing had each given equivalent characterisations of computability — Church via lambda calculus, Kleene via recursive functions, Turing via his eponymous machines. The Church–Turing thesis says these three notions coincide, and recursion is intrinsic to all of them.

Wilhelm Ackermann's 1928 function (Zum Hilbertschen Aufbau der reellen Zahlen, Mathematische Annalen) was constructed specifically to be total but not primitive recursive — the first concrete demonstration that the recursive hierarchy is strictly richer than the primitive recursive one. Rózsa Péter's 1935 simplification gave the two-argument form everyone teaches today: A(0,n) = n+1, A(m+1,0) = A(m,1), A(m+1,n+1) = A(m, A(m+1,n)). A(4,2) already exceeds the number of atoms in the observable universe; the function is the canonical witness that not every total computable function is feasible.

The bridge from theory to practice is John McCarthy's Recursive Functions of Symbolic Expressions and Their Computation by Machine, Part I (CACM, April 1960). McCarthy defined LISP as an embodiment of the lambda calculus on a real computer, with first-class functions, recursion as the only iteration construct, and lists as the only data structure. The paper introduced if-then-else as an expression form, the cond conditional, and the explicit construction of an interpreter for LISP in LISP — the eval function whose 14 lines remain a touchstone of computer science. Haskell Curry and Barkley Rosser had earlier given combinatory logic a recursion operator as the Y combinator, the fixed-point construction Y = λf.(λx.f(x x))(λx.f(x x)) that lets anonymous lambdas refer to themselves; Curry's 1930 thesis at Göttingen and the 1958 Curry–Feys book Combinatory Logic are the canonical sources.

Recursion and mathematical induction are duals. To prove a property holds for all natural numbers, you prove a base case and an inductive step; to compute a function on natural numbers, you write a base case and a recursive case. The shape of the proof and the shape of the program are the same. This isomorphism — structural recursion mirrors structural induction — is what makes recursive code on inductive data types (lists, trees, syntax) so much easier to reason about than its iterative counterpart. The Coq proof assistant, Agda, Idris, and Lean all rely on it: their pattern-matching fix constructs are restricted forms of recursion that the type checker proves to terminate.

A small but important caveat: not every recursive definition terminates. Church and Turing both proved in 1936 that the halting problem — deciding whether an arbitrary recursive program eventually returns — is itself undecidable. Total functional languages like Coq and Agda restrict recursion to provably-terminating forms (structural recursion on a well-founded order), at the cost of expressiveness; you can't write a fixpoint iteration that decides the halting problem because the type system rules it out. General-purpose languages allow general recursion and accept that some programs loop forever. This is the same trade-off you see between SQL with recursive CTEs (bounded by the optimiser) and Turing-complete query languages.

The exponential trap — the same subproblem solved over and over

Same subproblem, solved over and over.

Look at the recursion tree for fib(6). fib(4) appears twice — once via fib(5)'s left child, once via fib(6)'s right child. fib(3) appears three times. fib(2) appears five. The number of times fib(k) is recomputed in the tree for fib(n) follows Fibonacci itself.

Total calls grow as ≈ φⁿ / √5 where φ ≈ 1.618. That's roughly 2ⁿ — exponential. fib(40) takes a couple of seconds; fib(50) takes minutes; fib(60) won't finish before the heat death of your laptop. The work isn't hard per call; the work is that the same easy work is being repeated by an exploding number of calls.

n	Naive calls	Time @ 10ns/call
10	177	2 µs
20	21,891	220 µs
30	2,692,537	27 ms
40	331,160,281	3.3 s
50	40,730,022,147	~7 min
60	5,008,944,789,219	~14 hr

This is the diagnostic question dynamic programming answers: am I solving the same subproblem more than once? If yes, cache the answer. If you can't see overlap, recursion isn't going to be faster than the naive approach.

The diagnostic generalises beyond Fibonacci. Edit-distance recursion (Wagner–Fischer, 1974), longest common subsequence (Hirschberg, 1975), 0/1 knapsack, matrix-chain multiplication (Hu and Shing, 1982), and shortest path in DAG all share the property that direct recursion produces an exponential tree but the number of distinct subproblems is polynomial. Richard Bellman coined the term dynamic programming in 1953 at the RAND Corporation precisely for this pattern; his 1957 book Dynamic Programming (Princeton University Press) gave the principle of optimality and the recursive Bellman equation that underlies reinforcement learning to this day.

A simple test detects overlap. Print the arguments of every recursive call in a small example. If the same arguments appear more than once, you have overlap and memoisation will help; if every call has unique arguments, you don't and it won't. Quicksort and mergesort, despite being recursive, have no overlapping subproblems — each call partitions a different region of the array. Memoising them would only waste memory. The presence of overlap, not the presence of recursion, is what makes a problem amenable to dynamic programming.

Memoisation — cache the answer the first time

Cache the answer the first time you compute it.

Memoisation is the smallest possible change to naive recursion: keep a map from input to result. Before computing, check the map. After computing, write the map. The recursion still flows top-down — you start by asking fib(n) — but a subproblem only ever expands into its full subtree once.

const cache = new Map();
function fib(n) {
  if (cache.has(n)) return cache.get(n);
  if (n < 2) return n;
  const v = fib(n - 1) + fib(n - 2);
  cache.set(n, v);
  return v;
}

Switch the simulator to Top-down (memo). The tree now has two kinds of leaf: real base cases (n < 2) and cache hits highlighted in amber. The amber leaves are subproblems that would have expanded into their own subtree in naive mode but were short-circuited because we'd seen them before. For Fibonacci this collapses the work from O(2ⁿ) to O(n): there are exactly n + 1 distinct inputs, each computed once.

The cost has shifted. Time is now linear, but space takes both the cache (O(n) entries) and the recursion's own stack (O(n) frames at peak). On large n, the recursion stack is the limit — most languages cap it at 1,000–10,000 frames before RangeError / StackOverflowError. Memoisation removes time, not stack pressure.

When memo wins

Top-down is right when the natural way to think about the problem is recursive (tree DP, expression evaluation, parser combinators), when not all subproblems are needed (you only fill what's reachable from the root), or when the input space is sparse and a hash table is more compact than a dense array.

Tabulation — flip the dependency arrow, no recursion at all

Flip the dependency arrow. No recursion at all.

Bottom-up DP starts from the base cases and walks forward, filling a table where each entry is a function of earlier entries. The recursion vanishes — there's just an array and a loop. For Fibonacci, you iterate from i = 2 to n, each dp[i] = dp[i-1] + dp[i-2].

function fib(n) {
  if (n < 2) return n;
  const dp = [0, 1];
  for (let i = 2; i <= n; i++) dp[i] = dp[i - 1] + dp[i - 2];
  return dp[n];
}

Switch the simulator to Bottom-up (tabulate). The tree disappears. The visualisation becomes a row of cells, each filled in order. There's no call stack to grow because there are no calls. n can go much higher safely.

And once you have the table, the next observation often follows: do I really need to keep the whole table? For Fibonacci, each dp[i] only looks back two steps. Two rolling variables suffice — space drops from O(n) to O(1). This optimisation is hard to spot when you start from recursion and trivial when you start from a table; that's the case for tabulation as a default in interview-style problems.

Approach	Time	Space	Stack risk
Naive recursion	O(2ⁿ)	O(n)	low (depth = n)
Top-down (memo)	O(n)	O(n) cache + O(n) stack	real on large n
Bottom-up (table)	O(n)	O(n)	none
Bottom-up (rolling)	O(n)	O(1)	none

When bottom-up wins

Tabulation is right when every subproblem is needed (dense DP), when the iteration order is obvious (sequence DP, knapsack, edit distance), or when stack depth would overflow on memo. The downside: if you don't actually need every cell, you'll fill them anyway.

Why your stack overflows — and what tail-call optimisation does

Why your stack overflows.

Each function call allocates a stack frame holding parameters, local variables, and a return address. For naive Fibonacci with n = 50, the maximum stack depth is 50 frames — totally fine. For factorial(100,000) implemented recursively you'd need 100,000 frames, and most runtimes (V8, CPython, OpenJDK) refuse beyond a few thousand. Hence RecursionError: maximum recursion depth exceeded in Python and RangeError: Maximum call stack size exceeded in JavaScript.

Concrete numbers vary by runtime. CPython defaults to 1000 frames (sys.getrecursionlimit()); the underlying C stack is the actual constraint, typically 8 MB on Linux pthreads, controllable via ulimit -s and the RLIMIT_STACK rlimit. V8 allows around 10000–15000 calls depending on frame size. The JVM gives 512 KB per thread by default, controllable via -Xss. Go goroutines start with 8 KB and grow on demand by copying live data into a larger segment, so practical recursion depth is bounded only by available memory.

Tail-call optimisation (TCO) turns a recursive call at the very end of a function into a jump: the current frame is replaced rather than added to. Guy Steele's 1977 paper Debunking the “Expensive Procedure Call” Myth — later rendered as Lambda: The Ultimate GOTO — argued that with TCO, recursion is no more expensive than iteration, and iteration is just a stylised form of recursion. Scheme has required TCO since the R5RS standard (1998) and the requirement carried through R6RS and R7RS. OCaml, F#, Erlang, Elixir, Lua, Kotlin, and Scala (with the @tailrec annotation) all implement it. JavaScript's ECMAScript 2015 mandated proper tail calls, but no major engine ships it — debugger compatibility was the cited reason. Java doesn't. CPython doesn't. Don't rely on TCO unless your runtime documents it.

If you're hitting recursion limits and the runtime won't help, three workarounds apply. First, convert the recursion to iteration with an explicit stack — an array of work items processed in a loop. That's bottom-up DP for overlapping subproblems, or an explicit DFS/BFS queue for tree and graph traversals. Second, use continuation-passing style (CPS) and trampolining: every function returns a thunk — a closure representing the next step — and a top-level loop calls them in sequence, so the C stack never grows. Clojure's trampoline and recur, Scala's scala.util.control.TailCalls, and most Haskell runtimes use this internally. Third, use accumulator-passing style to make the recursion tail-call-shaped, even if your language doesn't optimise it — the rewrite is often clearer in its own right.

A subtle gotcha

A function that “looks” tail-recursive may not be one. return n * fact(n - 1) is not tail-recursive — the multiplication happens after the call returns, so the frame must be kept. The tail-recursive form uses an accumulator: return go(n - 1, acc * n).

Where recursion earns its keep

Where recursion earns its keep.

Several classes of algorithm are most naturally expressed with recursion even when DP is irrelevant. Divide-and-conquer sorts — mergesort (von Neumann, 1945) and quicksort (Hoare, The Computer Journal, 1962) — recursively split, conquer, and combine. The Fast Fourier Transform of Cooley and Tukey (Math. of Computation, 1965) is the canonical divide-and-conquer numerical algorithm; its recursion bottoms out at base-case 1-element FFTs and combines them with twiddle factors all the way back up. Karatsuba multiplication (1962), Strassen's matrix multiplication (1969), and the Schönhage–Strassen algorithm (1971) are all recursive divide-and-conquer constructions that beat the obvious quadratic or cubic bound.

Backtracking search is the other domain where recursion is the natural fit. The classic n-queens problem — place n non-attacking queens on an n×n board — was studied by Gauss in 1850 and given its modern recursive formulation by Niklaus Wirth in Algorithms + Data Structures = Programs (1976). Sudoku solvers, SAT solvers (DPLL, Davis–Putnam–Logemann–Loveland 1962), graph colouring, and constraint propagation all share the same recursive shape: try a choice, recurse, undo if it fails, try the next. The recursion structure mirrors the search tree exactly, and the call stack does the bookkeeping the algorithm needs anyway.

Recursive descent parsers — one function per grammar non-terminal, calling each other to consume input — are how almost every hand-written compiler frontend works. The Go compiler, the V8 parser, GCC's C++ parser, and Roslyn's C# parser are all recursive descent. The recursion is mutual: parseExpression calls parseTerm which calls parseFactor which can recurse back into parseExpression for parenthesised subexpressions. The depth of the parse stack is the depth of the syntactic nesting; for languages with deep nesting (Lisp, JSON), stack overflow on adversarial input is a real concern, addressed by hand-rolled iteration in production parsers and by Postgres' explicit SQL parser nesting limits.

File-system traversal, tree rendering in browsers, JSON and XML parsing, AST walking, regular expression matching with backtracking, type checking in compilers, theorem proving in Coq and Lean, code generation in template engines, recursive descent in protobuf parsers — the list keeps going. Wherever the data is shaped like a tree, recursion is usually the clearest way to walk it.

Default to iteration for everything else. It's faster (no call overhead), bounded (no stack risk), and easier to debug.

Problem shape	Pattern	Why
Linear scan, accumulating state	iteration	simpler and faster
Divide-and-conquer, no overlap	recursion (no DP)	natural; no caching needed
Tree DP (game theory, parse trees)	top-down memo	not all subproblems needed
Sequence DP (LCS, edit distance)	bottom-up table	dense; stack-safe; rolling space
Graph shortest path / connectivity	BFS / DFS with explicit queue/stack	visited set + ordering matter more than recursion
Backtracking search (SAT, n-queens)	recursion + undo	call stack mirrors search tree
Very deep linear recursion	iteration or trampoline	real-world stack limits

Two final notes. First, memoisation isn't free — the cache itself takes space, and hash-map lookups have constant overhead that dominates short subproblems. For tiny inputs, naive recursion is sometimes faster than memo. Profile, don't assume. Second, state shape matters. If your DP state is a tuple of (i, j, k, mask), the cache key has to be deterministic and hashable; bottom-up's table coordinates are always integer indices, no key construction needed. Picking bottom-up when state is multi-dimensional often makes the code dramatically cleaner.