08 / 20 · Tier B
Patterns / 08

Backtracking: build, recurse, undo

DFS with state mutation that gets reversed on return. You build a partial candidate one choice at a time, recurse to extend it, and on the way back out you undo the choice so the next sibling sees a clean slate. It is how you enumerate permutations, subsets, combinations, board placements, and anything else that asks for "all valid configurations" of something small.


1 · Why backtracking matters: the intuition

Backtracking is exhaustive search with pruning. You walk a tree of partial solutions, extending one choice at a time. When a choice fails a constraint, you abandon the subtree without exploring it. When a partial reaches a valid full solution, you record it and back out. The shape of the algorithm is always the same three steps: choose, explore, unchoose.

The pruning is what separates backtracking from naive brute force. A naive N-Queens enumeration would try nn placements; backtracking with a column/diagonal constraint check trims that to ~n! at most, and on average much less. The same pattern applies to Sudoku, Word Search, and every combinatorial problem with a feasibility predicate.

The boundary with DP is sharp. If the recursion tree has overlapping subproblems (the same partial state arises from many different paths), you should memoise (that's DP). If every partial is distinct (every path through the tree reaches a different leaf), there is nothing to cache, and backtracking is the right tool. Permutations: distinct paths, backtracking. Edit distance: overlapping paths to the same (i, j), DP.

The choose / explore / unchoose template. Every backtracking solution is some specialisation of: add a choice to the partial; recurse to extend; remove the choice on return. The undo step is the whole point. It's what lets the next sibling branch start from a clean slate without copying the entire partial.

2 · How to recognise it

Backtracking is the right pattern when any of the following hold. The first two are the strongest signals; the rest support. Each row is a sub-pattern worth recognising on sight.

  • Permutations, combinations, subsets. "Return all permutations of nums", "every subset of nums", "all combinations summing to target". The defining family: distinct paths, no memoisation, full enumeration.
  • N-Queens, Sudoku, constraint satisfaction. Place N pieces with no conflicts; fill a grid satisfying rules. Most candidates fail an O(1) constraint check before recursing, so effective work is much less than worst case.
  • Word search on a grid. LC 79 (single word), LC 212 (many words via Trie). DFS from each starting cell, tracking visited cells in the current path; undo the visit on return so a different start can re-use the cell.
  • String partitioning / parsing. "Split this string into palindrome substrings" (LC 131), "all valid IP addresses" (LC 93), "all ways to insert operators" (LC 282). Each branch consumes a prefix; recurse on the suffix.
  • Path enumeration in graphs. "All paths from src to dst in a DAG" (LC 797). DFS with a path stack; emit a copy when you reach dst; pop on return.
  • "Choose K of N", "place N non-attacking", "fill the grid". Any combinatorial construction where each step makes a choice from a small set.
  • The brute force is O(N!) or O(2^N) and the constraints are tiny. If n ≤ 20 for subsets, or n ≤ 10 for permutations, the problem setter is usually pointing at backtracking.
Backtracking vs plain DFS in one line. Backtracking is DFS where the recursive frame mutates shared state and undoes the mutation on return. If you copy state on every recursive call instead, that is plain DFS: usually slower, sometimes simpler. The undo step is what makes it backtracking.

3 · Sister algorithms

Backtracking is the base. A handful of close relatives extend or specialise it for harder problems. Knowing them by name saves you from reinventing them in an interview.

AlgorithmHow it extends backtrackingWhen to reach for it
Branch and boundBacktracking plus a global bound. Prune any branch whose best-possible result is worse than the current best."Minimise / maximise" problems where you can cheaply estimate the optimum from a partial.
Iterative deepening (IDA*)Repeated depth-limited DFS, deepening the limit each iteration. Memory of DFS, optimality of BFS.Huge state spaces where BFS would OOM. 15-puzzle and similar.
Constraint propagation (AC-3)Before recursing, narrow each variable's domain by enforcing arc consistency. Often eliminates entire branches without trying them.Sudoku, scheduling, real CSP solvers.
Min-conflictsLocal search rather than DFS. Start with a random assignment, repeatedly fix the most-conflicted variable.Very large CSPs where backtracking would be too slow.
DLX (dancing links)Knuth's exact-cover algorithm. Backtracking with O(1) undo via doubly-linked-list updates.Polyomino tiling, exact-cover encodings of Sudoku.
MemoisationIf the recursion tree has overlapping subproblems, you have DP, not backtracking. Cache by state and stop recursing.Word Break, Edit Distance, anything with overlapping subpartials.
The escalation ladder. Plain backtracking → branch and bound (add a global bound) → constraint propagation (narrow domains before recursing) → DLX (constant-time undo). Each step adds either implementation complexity or specialised data structures. For LC-style problems, plain backtracking with one or two constraint checks is almost always enough.

4 · Canonical example: Permutations (LC 46)

Problem. Given an array nums of distinct integers, return all possible permutations. Any order is fine.

Input:  [1, 2, 3]
Output: [[1,2,3], [1,3,2],
         [2,1,3], [2,3,1],
         [3,1,2], [3,2,1]]
Explanation: every ordering of the three elements.

Build a permutation one slot at a time. At each step, pick an element that has not yet been used, mark it used, recurse to fill the next slot, then unmark it before trying the next candidate. The recursion bottoms out when the partial has length n: emit a copy and return. Each leaf of the recursion tree corresponds to one permutation; there are n! of them.

5 · Reference implementation

Two versions of the same algorithm: Go and Python. The Go version is the one you'd write in an interview today; Python is faster to read.

Go: interview-ready

func permute(nums []int) [][]int {
    n := len(nums)
    used := make([]bool, n)
    partial := []int{}
    out := [][]int{}

    var backtrack func()
    backtrack = func() {
        if len(partial) == n {
            // SNAPSHOT — copy the partial; appending it directly would
            // store a reference that gets mutated by the next pop.
            cp := make([]int, n)
            copy(cp, partial)
            out = append(out, cp)
            return
        }
        for i := 0; i < n; i++ {
            if used[i] {
                continue
            }
            // CHOOSE
            used[i] = true
            partial = append(partial, nums[i])
            // RECURSE
            backtrack()
            // UNDO
            partial = partial[:len(partial)-1]
            used[i] = false
        }
    }

    backtrack()
    return out
}

Python: for reading and reasoning

def permute(nums: list[int]) -> list[list[int]]:
    n = len(nums)
    used = [False] * n
    partial: list[int] = []
    out: list[list[int]] = []

    def backtrack():
        if len(partial) == n:
            out.append(partial.copy())     # SNAPSHOT, not the live list
            return
        for i in range(n):
            if used[i]:
                continue
            used[i] = True                 # CHOOSE
            partial.append(nums[i])
            backtrack()                    # RECURSE
            partial.pop()                  # UNDO
            used[i] = False

    backtrack()
    return out
Snapshot the partial when you emit it. The single most common backtracking bug. out.append(partial) stores a reference to the same list every leaf mutates. By the time you return, every entry in out points at an empty list. Always copy: partial.copy(), partial[:], or list(partial).

6 · Variations

The shape of the recursion and what you track changes by problem type. Six common variations:

VariationWhat changesExample
PermutationsTrack which elements are used; at each level any unused element is a candidate. Recursion depth is n; branching factor shrinks.LC 46 Permutations
CombinationsTrack a start index; only consider elements at or after it. No "used" array needed: order is fixed by index.LC 77 Combinations
SubsetsTwo-choice recursion (include or skip) or index-based. Emit the partial at every node, not just leaves.LC 78 Subsets
With duplicatesSort the input. At each level, skip an element if it equals the previous one and the previous was not picked in this slot. Avoids generating the same configuration twice.LC 47 Permutations II, LC 90 Subsets II
Constraint pruningCheck validity before recursing: the queen's column/diagonal, the Sudoku row/box, the word grid bounds. Failed candidates return immediately, never recurse.LC 51 N-Queens, LC 37 Sudoku
Early terminationReturn True the moment any branch succeeds; propagate up. Useful when you only need one solution, not all of them.LC 37 Sudoku, LC 79 Word Search

7 · The recursion tree, drawn out

Backtracking on Permutations of [1, 2, 3]. Each node is a partial; each edge is a "choose" step; each leaf is a complete permutation that gets emitted. The shape is a perfectly balanced n!-leaf tree at depth n: six leaves for n=3, twenty-four for n=4. Counting the leaves IS the complexity proof.

depth 0:depth 1:depth 2:depth 3:leaves:[ ][1][2][3]+1+2+3[1,2][1,3][2,1][2,3][3,1][3,2][1,2,3][1,3,2][2,1,3][2,3,1][3,1,2][3,2,1]3! = 6 leaves, branching factor shrinks: 3, then 2, then 1.Total work = sum of leaves at each depth = O(n × n!). The n! is unavoidable; the extra n is the copy-the-partial cost.
The branching-factor shrink. At depth 0 we have 3 choices. At depth 1, only 2 (one element is used). At depth 2, only 1. The branching factor decreases by one per level. This is why permutations scale as n! rather than n^n. The same shrink happens in combinations (constrained by start-index) and subsets (constrained by two-choice include/skip). Pruning extends this further. Every constraint failure at depth k cuts off the (n−k)! sub-tree that would have followed.

8 · The pruning gold mine: N-Queens

N-Queens is the showcase for why pruning matters. The naive enumeration would try n^n placements (n choices per row). The constraint check (no two queens share a column, diagonal, or anti-diagonal) cuts the search space dramatically. For n=8, the naive count is 16,777,216; with pruning, the actual recursion visits ~16,000 nodes: a 1,000× reduction.

def solve_n_queens(n: int) -> int:
    cols  = set()        # column index taken
    diag1 = set()        # r + c (main diagonal)
    diag2 = set()        # r - c (anti-diagonal)
    count = 0

    def place(row: int):
        nonlocal count
        if row == n:
            count += 1
            return
        for col in range(n):
            # O(1) pruning check — three set lookups
            if col in cols or (row + col) in diag1 or (row - col) in diag2:
                continue
            cols.add(col); diag1.add(row + col); diag2.add(row - col)
            place(row + 1)
            cols.remove(col); diag1.remove(row + col); diag2.remove(row - col)

    place(0)
    return count
Why three sets, not a 2D board. Checking a 2D board takes O(n) per placement (scan column, diagonal, anti-diagonal). The three-set encoding makes the check O(1) using arithmetic invariants: row + col is constant along a main diagonal, row - col is constant along an anti-diagonal. Three sets of integers replace an N×N grid scan. This is the kind of constant- factor improvement that turns N-Queens from a 10-second solve to a millisecond solve at n = 14.

9 · Worked example: Permutations end-to-end (LC 46)

Already covered in section 4, but worth re-reading the trace. The Go scaffold above shows the canonical shape: used-array, mark, recurse, unmark. Two details to emphasise:

  • Snapshot the partial when you emit. cp := make([]int, n); copy(cp, partial). Never append the slice directly, because Go slice headers share the backing array, and the next pop will mutate every "completed" permutation you've stored.
  • The branching shrink. At depth 0 you have n choices. At depth 1, n - 1 (one element used). At depth k, n - k. Total leaves: n!. Total work to enumerate and copy: O(n × n!). The n! for the leaves, the extra n for the copy at each.
The duplicates wrinkle (LC 47). When the input has duplicates, the recursion emits the same permutation twice. The fix: sort the input first; at each level, skip an element that equals the previous one if the previous one was not used in this slot (if i > 0 && nums[i] == nums[i-1] && !used[i-1] { continue }). Subtle but essential: without it, [1, 1, 2] generates 6 permutations instead of 3.

10 · Worked example: Word Search (LC 79)

Problem. Given a 2D board of characters and a target word, return true if the word can be constructed by following 4-directionally adjacent cells, with no cell reused in a single path.

This is backtracking on a grid. From each starting cell that matches word[0], DFS outward; mark cells visited in the current path; on return, unmark so a different start can reuse the cell. Pruning: if the current cell doesn't match the next character, abort immediately.

func exist(board [][]byte, word string) bool {
    rows, cols := len(board), len(board[0])
    dirs := [4][2]int{{-1, 0}, {1, 0}, {0, -1}, {0, 1}}

    var dfs func(r, c, idx int) bool
    dfs = func(r, c, idx int) bool {
        if idx == len(word) {
            return true                   // full word matched
        }
        if r < 0 || r >= rows || c < 0 || c >= cols {
            return false                  // off grid
        }
        if board[r][c] != word[idx] {
            return false                  // PRUNE — character mismatch
        }

        // CHOOSE — mark visited via in-place mutation; saves a visited grid
        save := board[r][c]
        board[r][c] = '#'                 // sentinel — won't match any letter

        for _, d := range dirs {
            if dfs(r+d[0], c+d[1], idx+1) {
                board[r][c] = save        // restore even on success
                return true
            }
        }

        // UNDO
        board[r][c] = save
        return false
    }

    for r := 0; r < rows; r++ {
        for c := 0; c < cols; c++ {
            if dfs(r, c, 0) {
                return true
            }
        }
    }
    return false
}
The "mutate the board" trick. Instead of carrying a parallel visited grid, overwrite the board cell with a sentinel ('#' in our case) during the recursive call and restore it on return. Saves O(rows × cols) auxiliary space and gives O(1) visited-check via the character comparison itself. Mention this in the interview. It's the kind of micro-optimisation that separates candidates.

Complexity. O(rows × cols × 4L) in the worst case, where L is the word length. Each starting cell launches a DFS with branching factor 4 (or less, after revisiting check), bounded by depth L. The character- match pruning makes the average case far better.

11 · Hard worked example: N-Queens with bit sets (LC 51)

Problem. Place n queens on an n × n chessboard so no two attack each other. Return all distinct solutions as a list of board configurations.

The constraint check is the win. A queen attacks along columns, along the main diagonal (row + col constant), and along the anti-diagonal (row - col constant). Track three sets of occupied columns/diagonals; each placement check is O(1). For maximum speed, replace the sets with three integers and use bitwise ops.

func solveNQueens(n int) [][]string {
    out := [][]string{}
    queens := make([]int, n)              // queens[row] = col

    var place func(row int, cols, diag1, diag2 int)
    place = func(row int, cols, diag1, diag2 int) {
        if row == n {
            // Build the board representation from queens[]
            board := make([]string, n)
            for r := 0; r < n; r++ {
                line := make([]byte, n)
                for c := 0; c < n; c++ {
                    if queens[r] == c {
                        line[c] = 'Q'
                    } else {
                        line[c] = '.'
                    }
                }
                board[r] = string(line)
            }
            out = append(out, board)
            return
        }

        // available = positions where no column or diagonal is taken
        available := ((1 << n) - 1) &^ (cols | diag1 | diag2)
        for available != 0 {
            // Lowest set bit — try this column
            bit := available & -available
            available &^= bit
            col := bitsTrailingZeros(bit)

            queens[row] = col
            place(row+1, cols|bit, (diag1|bit)<<1, (diag2|bit)>>1)
            // No explicit undo needed — we passed values, not pointers
        }
    }

    place(0, 0, 0, 0)
    return out
}

// Helper — math/bits.TrailingZeros in production
func bitsTrailingZeros(x int) int {
    if x == 0 {
        return 64
    }
    n := 0
    for x&1 == 0 {
        x >>= 1
        n++
    }
    return n
}
Why bitwise. Each diagonal can be represented as a bitmask of "this column is unsafe at this row". When you place a queen at column c in row r, the main diagonal "shifts left by 1" for row r + 1 (one column to the left becomes unsafe). The anti-diagonal shifts right. Three integer operations per recursive call, no set lookups. For n = 14 this is the difference between a 5-second solve and a 30-millisecond one.
The undo is free. Because we pass cols, diag1, diag2 by value (Go's default for ints), the recursive call's modifications don't affect the caller's copy. No explicit undo step. This is one of the rare cases where "copy state on every call" wins over "mutate and undo". The state is tiny (three ints) and the copy is cheaper than the bookkeeping.
N-Queens, n=4 — backtracking with column / diagonal pruning[ ][0][1][2][3][1,0]✗[1,2]✗[1,3][1,3,0]explored partialpruned (constraint fail)solution reachedbacktrack (undo + retry)Each prune cuts off (n−k)! candidates at depth k. Visited ~20 nodes vs 4⁴ = 256 brute force.

12 · How to solve a hard backtracking problem: step by step

Five questions, in order, that turn a confusing prompt into a straightforward implementation.

  1. What is the choice at each step? Place a queen in a column? Pick the next character of a permutation? Open or close a parenthesis? Choose a digit for an empty Sudoku cell? Naming the choice cleanly is half the battle. The recursion tree's branching factor is exactly the number of choices.
  2. What state needs to be tracked? The partial result (the path being built), plus whatever auxiliary structures encode the constraints. For permutations: a used array. For N-Queens: three sets (cols, diag1, diag2). For Word Search: visited cells (or the board itself with sentinels). The state should be minimal: just enough to validate the next choice and to undo cleanly.
  3. When do you prune? Before each recursive call, check the constraints that the current partial must satisfy. If a constraint is violated, continue to the next choice without recursing. Pruning is where backtracking earns its keep. The entire subtree below the failed candidate is skipped.
  4. When do you record the answer? When the partial has reached a "full" state: length n for permutations, row == n for N-Queens, end-of-string for word search. Emit a copy of the partial (never the live reference) and return.
  5. How do you undo the choice cleanly? Every mutation made before the recursive call must be reversed after it. append followed by pop. used[i] = true followed by used[i] = false. board[r][c] = '#' followed by board[r][c] = save. Missing an undo silently corrupts the next sibling branch.
The pattern is mechanical once the state is right. Most backtracking problems are 80% the same code: choose, recurse, unchoose, snapshot on full. Spend the modelling time on "what is the state" and the implementation falls out.

13 · Five-problem progression

#LCProblemWhy it's here
1LC 46PermutationsThe minimal backtracking. Used-array, mark, recurse, unmark. The cleanest place to learn the skeleton.
2LC 78SubsetsTwo flavours: include/skip and index-based. Emit at every node. Good for internalising "the partial is the answer, not just the leaf".
3LC 39Combination SumBacktracking with a running sum and a start index. Pruning kicks in when the partial sum already exceeds the target.
4LC 51N-QueensThe classic constraint problem. Column, diagonal, and anti-diagonal sets pruned before each placement. Where pruning earns its keep.
5LC 37Sudoku SolverBacktracking with early termination. Find an empty cell, try 1-9, recurse, return on the first success. Closest to a real-world solver.

14 · What breaks: the deep list

  • Slicing path copy vs reference: the classic bug. In Go, out = append(out, partial) stores a pointer to the same backing array. The next partial = partial[:len(partial)-1] mutates that array, so by the time recursion returns, every entry in out points at the latest state. Always copy: cp := make([]int, len(partial)); copy(cp, partial); out = append(out, cp). In Python, partial.copy() or partial[:].
  • Append-then-pop on a shared slice still mutates. Even after the pop, if you've already stored a reference to the longer slice elsewhere, that reference reflects the post-pop state on subsequent appends, because Go reuses the backing array. The only safe pattern is "copy before storing".
  • Forgetting to undo. The recursion still terminates, but the partial is wrong for every sibling that follows. Symptoms usually look like duplicated or contaminated output. Every append needs a matching pop; every used[i] = true needs a matching used[i] = false; every board[r][c] = '#' needs the restore.
  • Not undoing state in the recursion exit. Easy to miss when the function has multiple return statements: each early-return path needs the undo. Best practice: put the undo at the bottom of the function and use a single return, or use a helper that wraps choose/recurse/undo as one unit.
  • Shallow vs deep copy of the partial. If the partial contains mutable children (e.g. a list of lists for N-Queens boards), a shallow copy is not enough. Copy the rows too. Symptoms: every "solution" in the output looks identical because they all share the same row slices.
  • Missing pruning leading to TLE. Pure brute force on n = 9 Sudoku is 981. It never finishes. Cheap constraint checks before each recursive call are usually the difference between "instant" and "infinite". If your backtracking TLEs, the question is "what cheap check could I do before recursing?", not "how do I make the recursion faster?".
  • Wrong "used" tracking for duplicates. The used[i-1] && nums[i] == nums[i-1] check in LC 47 is easy to get backwards. Sort first, then skip the duplicate only when the previous identical element is unused at this level. Otherwise you suppress valid permutations.
  • Deep recursion blowing the stack. Go's default stack grows but is capped (around 1 GB); Python's default limit is ~1000 frames. Most backtracking inputs are tiny, so this rarely bites. But a partition problem on a 105-character string can hit it. Convert to iterative DFS with an explicit stack if so.
  • Emitting at the wrong place. Subsets emits at every node; permutations emits only at leaves; combinations emits when the partial reaches the target size. Putting the emit call in the wrong spot changes the output silently.

15 · What to say in the interview

  1. "Enumeration → backtracking." Name the pattern out loud. "All permutations" or "all valid configurations" tells the interviewer you have recognised the shape.
  2. "Build a candidate, recurse, undo on return." State the skeleton in one breath. Then specify: snapshot the partial when you emit it.
  3. "Complexity is exponential (O(N!) for permutations, O(2^N) for subsets), but pruning makes it tractable." Acknowledge the worst case honestly. Then describe where the pruning comes from for this specific problem.
  4. "Space is O(N) for the recursion plus O(N) for the partial." Output size is separate. Listing it is fine, but distinguish auxiliary space from output space.
  5. "For duplicates, sort first, then skip same-value siblings at the same level." Show you know the standard de-duplication trick before they ask.
  6. Edge cases to mention. Empty input (return [[]] for subsets, [] or [[]] for permutations depending on convention), single element, all-duplicate input, target unreachable in combination-sum problems.

16 · Where this gets asked

CompanyCommon framingWhy it fits their stack
GoogleWord Search (LC 79), Word Search II (LC 212), Generate Parentheses (LC 22), Restore IP Addresses (LC 93)Search/parsing teams. Constraint pruning on grids and on grammar-like outputs is bread-and-butter.
MetaLetter Combinations of a Phone Number (LC 17), Subsets (LC 78), Permutations (LC 46)Ad-targeting + experimentation: enumerating valid combinations of toggles is a recurring shape.
AmazonCombination Sum (LC 39), Partition to K Equal Sum Subsets (LC 698), Palindrome Partitioning (LC 131)Bin-packing and order-splitting flavours show up in fulfilment + pricing problems.
MicrosoftN-Queens (LC 51), Sudoku Solver (LC 37), Word Break II (LC 140)Constraint-satisfaction asked in person more than the LC averages suggest. Office's parser test suites use the same shapes.
AppleCombinations (LC 77), Subsets II (LC 90), Combination Sum II (LC 40)De-duplication during enumeration is the part Apple interviewers probe — "what if the input has duplicates?".
Bloomberg / Two SigmaExpression Add Operators (LC 282), Remove Invalid Parentheses (LC 301)Strict parsing of operator strings and structured expressions — fits a finance/quant pipeline cleanly.
Pattern of patterns. Three sub-shapes — (1) "all permutations / subsets" with a used-array or include-skip recursion, (2) "constraint placement" (N-Queens, Sudoku) where most candidates fail an O(1) check before recursing, (3) "string partition" where each branch consumes a prefix. Recognise which of the three you're in and the skeleton writes itself.

17 · Complexity, side by side

Backtracking complexity is almost always exponential in the worst case. The bounds below are worst-case; constraint pruning makes real inputs much cheaper.

Problem familyWorst-case timeSpaceEffective with pruning
Permutations (LC 46)O(n × n!)O(n) recursion + O(n × n!) outputNo pruning possible — every leaf is a valid permutation.
Subsets (LC 78)O(n × 2n)O(n) recursion + O(n × 2n) outputNo pruning — every subset is valid.
Combination Sum (LC 39)O(2n) per candidate × number of candidatesO(target) recursion depthPruning when partial sum exceeds target — often 10–100× speedup.
N-Queens (LC 51)O(n!) brute forceO(n) recursion~16k recursive calls at n=8 vs 16M brute force — three orders of magnitude.
Sudoku (LC 37)O(9k) where k = empty cellsO(81)Real solver completes in milliseconds; constraint propagation prunes 99%+ of branches.
Word Search (LC 79)O(rows × cols × 4L)O(L) recursionCharacter-match pruning kills most branches in the first 2-3 levels.
The worst-case-vs-typical gap. Backtracking's theoretical complexity is almost always frightening on paper but tractable in practice. The honest interview answer is "worst-case O(N!) but constraint pruning makes the typical case much smaller — here's the cheap check that prunes most branches early." Naming the specific pruning is the senior signal.

18 · Real-world backtracking, in production code

SAT and SMT solvers.
Z3, MiniSat, CryptoMiniSat — all use backtracking (DPLL algorithm) with sophisticated learning. When the solver hits a contradiction, it learns a "conflict clause" that prunes the rest of the search space. The same shape underlies hardware verification at every chip company.
Constraint solvers for scheduling.
Airline crew scheduling, factory job shop, exam timetabling — all model as CSPs with backtracking solvers. Google's OR-Tools and IBM's CPLEX include high-performance backtracking engines under the hood.
Regex engines.
PCRE, .NET's System.Text.RegularExpressions, and Python's re module use backtracking to handle features like backreferences. The "catastrophic backtracking" failure mode (regex DoS) is what happens when the search tree explodes — (a+)+b against aaaaaaaaaaaaaX takes O(2n) time.
Type inference in compilers.
OCaml, Haskell, and Rust's borrow checker use backtracking when type inference has multiple valid resolutions and one fails downstream. The compiler tries one, fails, undoes, tries the next.
Puzzle solvers.
Sudoku, crosswords, KenKen, the 15-puzzle. Norvig's classic "Solving Every Sudoku Puzzle" essay walks through constraint propagation + backtracking in 100 lines of Python; production crossword solvers extend the same shape with dictionary-aware pruning.
Stat worth knowing. Z3 can solve SAT problems with tens of millions of clauses in seconds. The raw search space is 2n, but conflict-driven clause learning prunes so aggressively that effective complexity is closer to polynomial on real-world inputs. Backtracking with smart pruning beats every other approach on structured constraint problems.

19 · The duplicates handling trick: LC 47 / LC 90

Permutations and subsets get a new wrinkle when the input has duplicates. Naively, you'd emit the same configuration multiple times. The fix is the same in both cases: sort the input, then at each recursive level, skip an element if it equals the previous element and the previous element is not in use at this level.

// LC 47 — Permutations II — input may contain duplicates
func permuteUnique(nums []int) [][]int {
    sort.Ints(nums)                       // sort so duplicates are adjacent
    n := len(nums)
    used := make([]bool, n)
    partial := []int{}
    out := [][]int{}

    var backtrack func()
    backtrack = func() {
        if len(partial) == n {
            cp := make([]int, n)
            copy(cp, partial)
            out = append(out, cp)
            return
        }
        for i := 0; i < n; i++ {
            if used[i] {
                continue
            }
            // KEY PRUNE — skip duplicates at the same level
            // (previous identical element is NOT used in this slot)
            if i > 0 && nums[i] == nums[i-1] && !used[i-1] {
                continue
            }
            used[i] = true
            partial = append(partial, nums[i])
            backtrack()
            partial = partial[:len(partial)-1]
            used[i] = false
        }
    }

    backtrack()
    return out
}
Why "previous not used" and not "previous used". Think about [1, 1, 2] in the first slot. The first 1 has index 0; the second 1 has index 1. When we're considering the first 1 (used[0] = false) for the slot, that's fine. When we later consider the second 1 for the slot (i = 1, used[0] = false because we already backtracked), we should skip — the permutation we'd build is identical to the one starting with the first 1. The check !used[i-1] means "the previous duplicate is not currently used in any earlier slot", which means "this branch is a duplicate of the branch that used i-1 instead".

The same pattern works for subsets with duplicates (LC 90) and combination sum with duplicates (LC 40). Sort first, then prune equal-and-previous-unused at each level.

20 · Try it yourself

Permutations · LC 46
The minimal backtracking template. Used-array, mark, recurse, unmark. Hint: snapshot the partial with partial[:] when you emit it — appending the live list is the textbook bug.
Subsets · LC 78
Two flavours. Emit at every node (not just leaves). Hint: start-index recursion is cleaner than include/skip — no need to copy state, and the order is enforced by the index.
Combination Sum · LC 39
Running sum + start index, with reuse. Hint: prune the moment partial_sum > target — without that you'll TLE on the larger cases.
N-Queens · LC 51
The constraint-pruning showcase. Hint: three sets (cols, diag1 = r+c, diag2 = r−c) make every placement check O(1); without them the recursion explodes.
Stretch — Word Search II · LC 212
Backtracking + Trie. The combination that earns its own follow-up rounds at Google. Hint: build a Trie of the dictionary, then DFS the grid pruning on Trie nodes — a path that has no matching Trie edge stops immediately.
How to practise. Write the skeleton from memory before you start any backtracking problem: def solve(partial): if done: emit; for choice in choices: choose; solve(partial); unchoose. Then specialise — what is "done", what are "choices", what does "emit" do. Most LC backtracking problems are 80% the same code with a different decision at each level.

Further reading

  • CLRS — Chapter 34 (NP-completeness, backtracking sketches). Less about pure backtracking than about why it is the honest tool when the problem is exponential. Worth a single read.
  • Sedgewick & Wayne — Algorithms. The N-Queens and 8-Puzzle treatments are the cleanest in print, with code you can lift straight into Python.
  • Adjacent: DFS. The base technique. Backtracking is DFS with mutation; understanding DFS first makes the undo step obvious.
  • Adjacent: Recursion. The mental model for the call stack. If recursion still feels mysterious, start there.
  • Adjacent: Dynamic programming. When the recursion tree has overlapping subproblems, memoisation usually beats backtracking. Worth knowing the boundary.
  • Semicolony: Algorithm Visualizer. Step through the recursion tree and watch the partial mutate. The undo step is much clearer when you can see it happen.
Found this useful?