21 / 22 · Tier B
Patterns / 21

Segment tree & Fenwick

Range queries with concurrent updates. The prefix-sum array gives you range-sum in O(1) — until the first update arrives and the whole array has to be rebuilt. Segment tree fixes that. Both query and update become O(log n). For the special case of range-sum + point-update, the Fenwick tree is shorter, faster, and uses half the memory.


1 · Why this pattern matters

Prefix sums are the lazy answer to "range sum from l to r". One subtraction, O(1) query. The moment a single update lands, the prefix array is stale from that index onward and you pay O(n) to rebuild. Anything that interleaves queries with updates needs a better structure.

Segment tree gives you both query and update in O(log n) for any associative aggregate — sum, min, max, GCD, XOR, matrix product, anything where combine(a, b) is associative. The classic interview shapes:

  • LC 307 Range Sum Query — Mutable. The textbook intro. Point update, range sum.
  • LC 315 Count of Smaller Numbers After Self. BIT over coordinate-compressed values, walked right-to-left.
  • LC 327 Count of Range Sum. BIT over prefix sums; count how many earlier prefixes fall in [prefix − upper, prefix − lower].
  • LC 218 The Skyline Problem. Sweep-line with a segment tree of maxes over coordinate-compressed x positions.
  • LC 715 Range Module. Interval set with add/remove/query — segment tree with lazy propagation is one of two clean answers.
The signal. Whenever you see "range query + point update" with n, q ≤ 10⁵–10⁶, the O(log n) per operation is your only viable budget. If the aggregate is sum and updates are point-only, reach for BIT — half the code. Otherwise, segment tree.

2 · How to recognise it

  • "Range aggregate" of any kind. Sum, min, max, GCD, XOR, count, product mod p. If you can phrase the question as combine over a contiguous slice, segment tree handles it.
  • Concurrent point updates. The array isn't read-only. Some operation mutates a single element between queries; prefix-sums die.
  • "Add v to every element in [l, r] then query". Range updates, range queries. This is lazy-propagation territory.
  • Order-statistics on a stream. "How many earlier elements were smaller than x?" Compress values, BIT over the compressed index, walk in stream order.
  • Constraints n, q ≤ 10⁵–10⁶. O(n) per query is too slow; O(log n) is required. The tell.
  • "Reverse engineer the permutation from counts of inversions to the right", "find the k-th 1 in a binary array under updates", "count distinct elements in [l, r]" — all segment-tree variants.
What it is not. If the array is static — no updates ever — a prefix-sum array or a sparse table is simpler and faster. Segment tree is overkill for the static case. The whole reason it exists is updates.

3 · The intuition

A segment tree is a binary tree where each node owns a contiguous range of the input array. The root owns [0, n−1]. Each internal node splits its range in half and gives one half to each child. Leaves own single elements. Every node stores the aggregate of its range.

A range query for [l, r] walks down from the root, descending into children only when its range overlaps [l, r]. Any node whose range is fully contained returns its stored aggregate immediately — no need to recurse further. Any node fully outside returns the neutral element. The query touches at most O(log n) nodes because at each level the boundary cuts at most two ranges.

A point update flips one leaf, then walks up the unique path to the root re-combining each parent. Also O(log n). Build is O(n) — every internal node is touched once.

Mental model. Think of the tree as precomputed answers at every power-of-two scale. The root is the scale-n answer. The middle layer is the scale-n/2 answers. The leaves are scale-1. Any query is reassembled from at most 2 nodes per layer; any update touches exactly one node per layer.

4 · Iterative segment tree

For non-lazy cases — point update plus range query — the iterative "bottom-up" segment tree is faster than the recursive version and shorter to write. Allocate 2n slots; leaves live at indices n .. 2n−1; internal node i has children 2i and 2i+1. The structure assumes n is the input size — no power-of-two padding needed.

// Iterative segment tree for range sum + point update.
// Indices: leaves at [n, 2n). Internal node i has children 2i, 2i+1.
type SegTree struct {
    n int
    t []int
}

func NewSegTree(a []int) *SegTree {
    n := len(a)
    t := make([]int, 2*n)
    // Build: copy leaves, then aggregate bottom-up
    for i, v := range a {
        t[n+i] = v
    }
    for i := n - 1; i > 0; i-- {
        t[i] = t[2*i] + t[2*i+1]   // combine = sum; swap for min/max/xor
    }
    return &SegTree{n: n, t: t}
}

// Point update: set a[i] = v.
func (s *SegTree) Update(i, v int) {
    i += s.n
    s.t[i] = v
    for i >>= 1; i > 0; i >>= 1 {
        s.t[i] = s.t[2*i] + s.t[2*i+1]
    }
}

// Range query: sum of a[l..r) on a half-open interval.
func (s *SegTree) Query(l, r int) int {
    res := 0                       // neutral element for sum
    for l, r = l+s.n, r+s.n; l < r; l, r = l>>1, r>>1 {
        if l&1 == 1 {              // l is a right child — include it
            res += s.t[l]
            l++
        }
        if r&1 == 1 {              // r is a right child — include r-1
            r--
            res += s.t[r]
        }
    }
    return res
}
The half-open trick. Query(l, r) here means [l, r) — exclusive on the right. Mixing this with problem statements that say "[l, r] inclusive" is the #1 source of off-by-ones. Pick one convention, write it on the whiteboard, stick with it.

Swapping the aggregate is a one-line change: replace + with min, max, gcd, xor, whatever. The neutral element changes too — 0 for sum/xor, math.MaxInt for min, math.MinInt for max, 0 for gcd (because gcd(0, x) = x).

5 · Lazy propagation — for range updates

Iterative segtree handles point updates. The moment the problem says "add v to every element in [l, r]", a naive implementation degrades to O(n) per update — you'd touch every leaf in the range. Lazy propagation fixes this by storing pending updates at internal nodes and pushing them down only when needed.

The recursive segment tree is cleaner here. Each node carries a lazy value: "I owe my subtree this much, but I haven't pushed it yet". On any visit to a node, first push down any pending lazy to its children, then proceed.

// Recursive segment tree with lazy propagation.
// Supports range add + range sum on [0, n-1] inclusive.
type LazySeg struct {
    n         int
    sum, lazy []int
}

func NewLazySeg(n int) *LazySeg {
    return &LazySeg{n: n, sum: make([]int, 4*n), lazy: make([]int, 4*n)}
}

// Push pending lazy at node down to its two children.
func (s *LazySeg) push(node, l, r int) {
    if s.lazy[node] == 0 {
        return
    }
    mid := (l + r) / 2
    left, right := 2*node, 2*node+1
    s.sum[left]  += s.lazy[node] * (mid - l + 1)
    s.lazy[left] += s.lazy[node]
    s.sum[right]  += s.lazy[node] * (r - mid)
    s.lazy[right] += s.lazy[node]
    s.lazy[node] = 0
}

// Range update: add v to every a[i] for i in [ql, qr].
func (s *LazySeg) Update(node, l, r, ql, qr, v int) {
    if qr < l || r < ql {
        return                       // no overlap
    }
    if ql <= l && r <= qr {
        s.sum[node]  += v * (r - l + 1)
        s.lazy[node] += v
        return                       // full overlap — defer to lazy
    }
    s.push(node, l, r)               // partial overlap — must descend
    mid := (l + r) / 2
    s.Update(2*node, l, mid, ql, qr, v)
    s.Update(2*node+1, mid+1, r, ql, qr, v)
    s.sum[node] = s.sum[2*node] + s.sum[2*node+1]
}

// Range query: sum of a[ql..qr].
func (s *LazySeg) Query(node, l, r, ql, qr int) int {
    if qr < l || r < ql {
        return 0
    }
    if ql <= l && r <= qr {
        return s.sum[node]
    }
    s.push(node, l, r)
    mid := (l + r) / 2
    return s.Query(2*node, l, mid, ql, qr) + s.Query(2*node+1, mid+1, r, ql, qr)
}

// Public wrappers: Update(ql, qr, v) and Query(ql, qr) call into node=1, [0, n-1].
The push-down rule. Always push before descending into children. Always recombine after returning. Forgetting either is the canonical lazy-propagation bug — symptoms include "queries return stale values" or "two updates in a row only show one". If your lazy code is wrong, this is almost always why.

Lazy propagation generalises beyond range-add + range-sum. The ingredients are: (1) a way to compose two pending updates into one (e.g. add v₁ then add v₂ is add v₁ + v₂), and (2) a way to apply a pending update to an aggregate without expanding to leaves (e.g. add v to a length-k range adds v·k to its sum). Find both, and lazy propagation handles range-assign, range-multiply, range-flip (XOR), or any combination.

6 · Fenwick tree (BIT)

When the aggregate is sum (or any invertible operation — XOR works too) and updates are point-only, the Fenwick tree is the right tool. Shorter code, smaller memory footprint (just n+1 ints, not 4n), better cache behaviour, and a smaller constant factor that often matters on tight time limits.

The trick is the lowbit. For any positive index i, i & -i isolates the lowest set bit. The Fenwick array stores partial sums indexed by these lowbit-aligned ranges; updates and queries walk by adding or subtracting the lowbit, climbing or descending through a logarithmic number of indices.

// Fenwick tree (Binary Indexed Tree). 1-INDEXED — bit[0] is unused.
// Supports point-update and prefix-sum query in O(log n).
type BIT struct {
    n   int
    bit []int
}

func NewBIT(n int) *BIT {
    return &BIT{n: n, bit: make([]int, n+1)}
}

// Add v to position i (1-indexed).
func (b *BIT) Update(i, v int) {
    for ; i <= b.n; i += i & -i {     // climb: jump by lowbit
        b.bit[i] += v
    }
}

// Prefix sum of positions [1..i].
func (b *BIT) Prefix(i int) int {
    s := 0
    for ; i > 0; i -= i & -i {        // descend: strip lowbit
        s += b.bit[i]
    }
    return s
}

// Range sum on [l, r], inclusive.
func (b *BIT) Range(l, r int) int {
    return b.Prefix(r) - b.Prefix(l-1)
}

Three loops, total. The whole structure fits in 20 lines. Compared to the ~80 lines for a recursive lazy segment tree, it's no contest if your problem only needs prefix-sum-style queries.

The 1-indexed gotcha. Fenwick trees are 1-indexed because lowbit(0) = 0 would cause infinite loops. Always shift your input by one — store position p at index p + 1. Forgetting this gives correct-looking code that silently returns garbage for index 0.

Two extensions are worth knowing. Range update, point query via a BIT on the difference array: update(l, +v) and update(r+1, −v) followed by prefix(i) gives a[i] after all range additions. Range update, range query requires two BITs and a small algebraic trick (see the further-reading section).

7 · Worked example 1 — LC 307 Range Sum Query Mutable (easy)

Problem. Given an integer array nums, support two operations: update(i, v) sets nums[i] = v, and sumRange(l, r) returns the sum of nums[l..r] inclusive.

This is the textbook BIT problem. The aggregate is sum, updates are point-only, no range updates. BIT in 20 lines beats segment tree in 80 lines.

Steps

  1. Build a BIT of size n. Initialise each position by calling Update(i, nums[i]); total cost O(n log n). (Or use the linear-time build by computing partial sums in place.)
  2. Maintain the original array. An update call sets a value, but BIT stores deltas. Store nums[i] separately so you can compute delta = newValue − nums[i] and apply that.
  3. sumRange uses prefix(r+1) − prefix(l), accounting for the 1-indexed shift.
type NumArray struct {
    bit  *BIT
    nums []int
}

func Constructor(nums []int) NumArray {
    n := len(nums)
    arr := NumArray{bit: NewBIT(n), nums: append([]int(nil), nums...)}
    for i, v := range nums {
        arr.bit.Update(i+1, v)        // 1-indexed shift
    }
    return arr
}

func (a *NumArray) Update(i, v int) {
    delta := v - a.nums[i]
    a.nums[i] = v
    a.bit.Update(i+1, delta)
}

func (a *NumArray) SumRange(l, r int) int {
    return a.bit.Range(l+1, r+1)
}

Build O(n log n), each query and update O(log n). Q operations total: O((n + q) log n). Comfortable for n, q = 10⁵.

8 · Worked example 2 — LC 315 Count Smaller After Self (medium)

Problem. Given an integer array nums, return an array counts where counts[i] is the number of elements nums[j] with j > i and nums[j] < nums[i].

The classic BIT-over-coordinate-compression problem. Walk the array right-to-left; for each element x, the answer is "how many already-seen values are strictly less than x?". A BIT indexed by rank gives prefix counts in O(log n). The trick is compressing the value space first because raw values can be up to 10⁴ or live in any range.

Steps

  1. Coordinate-compress the values. Sort unique values; map each value to its rank (1..k). Now the BIT only needs k+1 slots, where k is the number of distinct values.
  2. Walk right-to-left. For each x at position i: look up its rank r, answer counts[i] = prefix(r − 1) (count of strictly smaller already seen), then Update(r, +1).
  3. Return counts. Total time O(n log n) including the sort for compression.
import "sort"

func countSmaller(nums []int) []int {
    n := len(nums)
    // STEP 1 — coordinate-compress
    sorted := append([]int(nil), nums...)
    sort.Ints(sorted)
    sorted = uniq(sorted)            // remove duplicates
    rank := func(x int) int {
        // 1-indexed rank for BIT
        return sort.SearchInts(sorted, x) + 1
    }

    // STEP 2 — BIT walk right-to-left
    bit := NewBIT(len(sorted))
    out := make([]int, n)
    for i := n - 1; i >= 0; i-- {
        r := rank(nums[i])
        out[i] = bit.Prefix(r - 1)   // strictly smaller seen so far
        bit.Update(r, 1)
    }
    return out
}

func uniq(a []int) []int {
    out := a[:0]
    for i, v := range a {
        if i == 0 || a[i-1] != v {
            out = append(out, v)
        }
    }
    return out
}
Why right-to-left. The question asks for "smaller elements to the right". Inverting the direction turns it into "smaller elements already seen", which is the natural BIT operation. The same trick works for LC 493 (reverse pairs) and LC 327 (count of range sum) with the right preprocessing.

9 · Worked example 3 — LC 218 Skyline Problem (hard)

Problem. Given a list of buildings as (left, right, height) triples, return the silhouette of the skyline as a list of (x, height) "key points" where the visible height changes.

The classic sweep-line problem. The segment-tree solution: coordinate- compress the x positions, build a segment tree over them keyed by max height, and for each building do a range-update with chmax(building.height) on its [left, right − 1] span. After processing all buildings, walk the segment tree's leaves in order; any height change emits a key point.

Steps

  1. Collect all distinct x coordinates. For each building, add left and right. Sort and deduplicate; this becomes the index space for the segment tree.
  2. Build a segment tree of size m (number of distinct x's) over the intervals between consecutive x's. Each leaf i represents the gap [x[i], x[i+1]). Value stored: current max height covering that gap.
  3. For each building (l, r, h): find the leaf indices for l and r, then range-update [l_idx, r_idx − 1] with chmax(h). chmax(v) means "set each element to max(element, v)" — a monotone operation, which lets you skip subtrees whose stored max already meets v.
  4. Walk leaves in order. Emit a key point at x[i] whenever height[i] ≠ height[i − 1].
The chmax-segtree shortcut. A naive range-update with max is O(n) per update because there's no clean "lazy" composition. Ji Driver / Segment Tree Beats — the technique behind chmax/chmin updates — uses extra metadata (max, second-max, max count) to prune. The amortised cost is O((n + q) log² n). For LC 218 the inputs are small enough that a plain sweep-line + multiset solution wins on implementation simplicity; segment tree shines when there are 10⁶ buildings.

An alternative answer to LC 218 — and the one most people write — uses a sweep-line with a max-heap and lazy deletion. Walk events (left = +height, right = −height) sorted by x; maintain the current max-height; emit a key point whenever the max changes. O((n + q) log n) with much shorter code. The segment-tree framing is cleaner for generalisations like "skyline with deletions" or "max height in a range query while processing".

10 · Sister algorithms

Segment tree and BIT cover most of the interview surface. The neighbours are worth knowing by name in case the problem demands a heavier tool.

StructureQueryUpdateWhen to reach for it
Sqrt decompositionO(√n)O(√n) or O(1)Simpler to write than segtree; good for "Mo's algorithm" offline queries.
Sparse tableO(1) for idempotent (min/max/gcd)not supportedStatic array, no updates. Build O(n log n), query O(1).
Persistent segment treeO(log n)O(log n), creates new versionOffline k-th smallest in a range (Wavelet alternative). Each update returns a new "version" without mutating the old.
Merge sort treeO(log² n)not supported"How many elements in [l, r] are ≤ k?" — sorted lists at every node, binary search inside.
Wavelet treeO(log σ)not supportedK-th smallest in a range, rank/count of a value in a range. σ is the alphabet size.
2D segment tree / 2D BITO(log² n)O(log² n)Rectangular range queries with point updates. The constant factor is rough; profile before adopting.
Segment tree beats (Ji Driver)O(log² n) amortisedchmin / chmaxRange-min-update / range-max-update with sum query. Niche but appears in CF problems.
Treap / implicit treapO(log n)insert/delete anywhere O(log n)When the array itself changes shape — splice, reverse a subarray, insert mid-array.

11 · How to solve hard problems step-by-step

A meta-process for any "range query + update" problem. Five steps; follow them in order.

  1. Identify the aggregate. Sum, min, max, GCD, XOR, count, product. Write combine(a, b) in one line. If combine isn't associative, you can't use a segment tree directly — re-frame or pick a different structure.
  2. Identify the update type. Point update (set or add to one position), range update (add v to every element in [l, r]), or range assign (overwrite every element in [l, r] with v). Range updates require lazy propagation.
  3. Pick the structure. Sum + point-update → BIT, no question. Anything else + point-update → iterative segment tree. Range update → recursive segtree with lazy. Static array → prefix sums or sparse table.
  4. For range updates, decide lazy vs difference array. If updates are "range add, query prefix-sum at the end", a single BIT over the difference array is enough. If updates and queries interleave, lazy segtree.
  5. Code carefully. 1-indexed BIT; power-of-2 array sizes for the iterative segtree; 4n slots for the recursive segtree. Pick a convention for inclusive/exclusive ranges and stick with it. Write the neutral element comment at the top of the file so you don't second-guess it later.
The compression step. If the value space is huge (nums[i] up to 10⁹) but the array length is small (n ≤ 10⁵), coordinate-compress before building the BIT or segtree. Sort unique values, map each value to its rank, work in rank-space. This is the difference between a 10⁵-slot structure and a 10⁹-slot impossibility.

12 · Complexity

VariantBuildQueryUpdateSpace
Iterative segtree (point upd, range query)O(n)O(log n)O(log n)O(2n)
Recursive segtree (lazy, range upd, range query)O(n)O(log n)O(log n)O(4n)
Fenwick tree (BIT, point upd, prefix sum)O(n log n)O(log n)O(log n)O(n)
BIT with difference array (range upd, point query)O(n log n)O(log n)O(log n)O(n)
Two-BIT (range upd, range query)O(n log n)O(log n)O(log n)O(2n)
2D segment treeO(nm)O(log n · log m)O(log n · log m)O(4nm)
Persistent segtreeO(n log n)O(log n)O(log n) per versionO((n + q) log n)
Sqrt decompositionO(n)O(√n)O(√n) or O(1)O(n + √n)

A word on constants. BIT's constant is roughly half that of an iterative segment tree because each loop iteration is one bit shift + one array access. The recursive lazy segtree has a 3–4× constant over BIT due to recursion overhead and the push-down. For n = 10⁵, all three fit comfortably in a 1-second time limit; for n = 10⁶ the choice starts to matter.

13 · What breaks

  • Off-by-one with 1-indexed BIT. Forgetting to shift by 1 on input or on prefix queries. Symptom: index-0 element is always wrong. Fix: always shift, always.
  • Lazy propagation, missing push-down. Querying a node without first pushing its pending lazy returns stale values. Symptom: two updates in a row, second one's effect missing. Fix: push() as the very first line of any descent.
  • Integer overflow on aggregates. Sum of 10⁵ values each up to 10⁹ overflows int32. Use int64 (Go's int on 64-bit is fine; explicitly int64 in Java / C++).
  • Recursive stack depth. For n = 10⁶, the recursion tree is ~20 deep — fine. But naive "build by recursion" with build(1, 0, n − 1) uses O(n) stack for unbalanced trees; switch to iterative build or increase the stack.
  • Wrong neutral element. The "no overlap" case returns the neutral element of combine. 0 for sum, math.MaxInt for min, math.MinInt for max, 0 for XOR, 0 for GCD (because gcd(0, x) = x), 1 for product. Get this wrong and queries silently return garbage on edge ranges.
  • Mixing inclusive and half-open ranges. Iterative segtree queries are [l, r); problem statements usually say [l, r]. Always convert at the boundary, not in the middle of the loop.
  • Wrong segtree size. Recursive segtree needs 4n slots, not 2n. The factor-of-4 accommodates non-power-of-two n and the over-allocation of the leftmost subtree. Allocating 2n and indexing out of bounds is the canonical "passes small tests, crashes on large" bug.
  • Coordinate-compression collisions. If two different problem values compress to the same rank by accident, your counts are off. sort.SearchInts on a deduplicated array is the safe lookup; map[int]int built from a sorted unique slice works too.
  • BIT used for non-invertible aggregates. BIT supports prefix queries that subtract; it works for sum and XOR. It does not work for min/max — you can't subtract a min. For min/max range queries, use a segment tree or sparse table.

14 · Where this gets asked

CompanyCommon framingWhy it fits
GoogleCount of Smaller Numbers (LC 315), Range Sum Query Mutable (LC 307), Reverse Pairs (LC 493)BIT-over-compression is a senior interview lever. Often paired with merge-sort solutions to test recognition.
MetaRange Module (LC 715), Count of Range Sum (LC 327), My Calendar III (LC 732)Booking / scheduling systems are range-update + range-query at scale. Calendar problems are a Meta onsite staple.
AmazonSkyline (LC 218), Falling Squares (LC 699), Rectangle Area (LC 850)Geometric sweep-line + segment tree, calibrated to packing / warehouse layout problems.
MicrosoftRange Sum (LC 307), Number of Longest Increasing Subsequence (LC 673)Bounds-checking + LIS variants where segment tree replaces an O(n²) DP.
Stripe / Bloomberg / Two SigmaOrder-book queries, event-log aggregations, sliding-window percentilesFinancial pipelines do range aggregates on event streams — BIT or segment tree on coordinate-compressed timestamps is the standard structure.
Pattern of patterns. Three sub-shapes — (1) "range query + point update" with sum → BIT; (2) "range query + point update" with min/max/gcd/xor → iterative segtree; (3) "range update + range query" → lazy segtree. Recognise which one the problem is, write the structure from memory, fill in the aggregate. The intellectual work is the coordinate compression and the predicate, not the data structure itself.

15 · Further reading

  • CP Algorithms — Segment Tree. The reference. Iterative, recursive, lazy, 2D, persistent — all covered with code. cp-algorithms.com/data_structures/segment_tree.html.
  • CP Algorithms — Fenwick Tree. Same site, the canonical BIT writeup including range-update + range-query with two BITs. cp-algorithms.com/data_structures/fenwick.html.
  • CLRS — Chapter 14 (Augmenting Data Structures). The theory of "store extra info at internal nodes" that segment tree is a special case of.
  • Codeforces blog — "Efficient and easy segment trees" by AlBert Rocha. The 20-line iterative segment tree that the implementation here is adapted from.
  • Codeforces blog — Segment Tree Beats. For chmin/chmax range updates with sum queries. The amortised analysis is beautiful.
  • Adjacent: binary search. Binary search on the segment tree — "find the leftmost position with prefix-sum ≥ k" — is one of the highest-use tricks in the toolkit.
  • Adjacent: back to patterns index.
Found this useful?