Segment tree & Fenwick
Range queries with concurrent updates. The prefix-sum array gives you range-sum in O(1) — until the first update arrives and the whole array has to be rebuilt. Segment tree fixes that. Both query and update become O(log n). For the special case of range-sum + point-update, the Fenwick tree is shorter, faster, and uses half the memory.
1 · Why this pattern matters
Prefix sums are the lazy answer to "range sum from l to
r". One subtraction, O(1) query. The moment a single update
lands, the prefix array is stale from that index onward and you pay O(n)
to rebuild. Anything that interleaves queries with updates needs a
better structure.
Segment tree gives you both query and update in O(log n) for any
associative aggregate — sum, min, max, GCD, XOR, matrix product,
anything where combine(a, b) is associative. The classic
interview shapes:
- LC 307 Range Sum Query — Mutable. The textbook intro. Point update, range sum.
- LC 315 Count of Smaller Numbers After Self. BIT over coordinate-compressed values, walked right-to-left.
- LC 327 Count of Range Sum. BIT over prefix sums; count how many earlier prefixes fall in
[prefix − upper, prefix − lower]. - LC 218 The Skyline Problem. Sweep-line with a segment tree of maxes over coordinate-compressed x positions.
- LC 715 Range Module. Interval set with add/remove/query — segment tree with lazy propagation is one of two clean answers.
n, q ≤ 10⁵–10⁶, the O(log n) per operation is
your only viable budget. If the aggregate is sum and updates are
point-only, reach for BIT — half the code. Otherwise, segment tree.2 · How to recognise it
- "Range aggregate" of any kind. Sum, min, max, GCD, XOR, count, product mod p. If you can phrase the question as combine over a contiguous slice, segment tree handles it.
- Concurrent point updates. The array isn't read-only. Some operation mutates a single element between queries; prefix-sums die.
- "Add v to every element in [l, r] then query". Range updates, range queries. This is lazy-propagation territory.
- Order-statistics on a stream. "How many earlier elements were smaller than
x?" Compress values, BIT over the compressed index, walk in stream order. - Constraints
n, q ≤ 10⁵–10⁶. O(n) per query is too slow; O(log n) is required. The tell. - "Reverse engineer the permutation from counts of inversions to the right", "find the k-th 1 in a binary array under updates", "count distinct elements in [l, r]" — all segment-tree variants.
3 · The intuition
A segment tree is a binary tree where each node owns a contiguous range
of the input array. The root owns [0, n−1]. Each internal
node splits its range in half and gives one half to each child. Leaves
own single elements. Every node stores the aggregate of its range.
A range query for [l, r] walks down from the root,
descending into children only when its range overlaps [l, r].
Any node whose range is fully contained returns its stored aggregate
immediately — no need to recurse further. Any node fully outside returns
the neutral element. The query touches at most O(log n) nodes because at
each level the boundary cuts at most two ranges.
A point update flips one leaf, then walks up the unique path to the root re-combining each parent. Also O(log n). Build is O(n) — every internal node is touched once.
4 · Iterative segment tree
For non-lazy cases — point update plus range query — the iterative
"bottom-up" segment tree is faster than the recursive version and
shorter to write. Allocate 2n slots; leaves live at indices
n .. 2n−1; internal node i has children
2i and 2i+1. The structure assumes n
is the input size — no power-of-two padding needed.
// Iterative segment tree for range sum + point update.
// Indices: leaves at [n, 2n). Internal node i has children 2i, 2i+1.
type SegTree struct {
n int
t []int
}
func NewSegTree(a []int) *SegTree {
n := len(a)
t := make([]int, 2*n)
// Build: copy leaves, then aggregate bottom-up
for i, v := range a {
t[n+i] = v
}
for i := n - 1; i > 0; i-- {
t[i] = t[2*i] + t[2*i+1] // combine = sum; swap for min/max/xor
}
return &SegTree{n: n, t: t}
}
// Point update: set a[i] = v.
func (s *SegTree) Update(i, v int) {
i += s.n
s.t[i] = v
for i >>= 1; i > 0; i >>= 1 {
s.t[i] = s.t[2*i] + s.t[2*i+1]
}
}
// Range query: sum of a[l..r) on a half-open interval.
func (s *SegTree) Query(l, r int) int {
res := 0 // neutral element for sum
for l, r = l+s.n, r+s.n; l < r; l, r = l>>1, r>>1 {
if l&1 == 1 { // l is a right child — include it
res += s.t[l]
l++
}
if r&1 == 1 { // r is a right child — include r-1
r--
res += s.t[r]
}
}
return res
}Query(l, r) here
means [l, r) — exclusive on the right. Mixing this with
problem statements that say "[l, r] inclusive" is the #1 source of
off-by-ones. Pick one convention, write it on the whiteboard, stick
with it.Swapping the aggregate is a one-line change: replace + with
min, max, gcd, xor,
whatever. The neutral element changes too — 0 for sum/xor,
math.MaxInt for min, math.MinInt for max,
0 for gcd (because gcd(0, x) = x).
5 · Lazy propagation — for range updates
Iterative segtree handles point updates. The moment the problem says
"add v to every element in [l, r]", a naive
implementation degrades to O(n) per update — you'd touch every leaf in
the range. Lazy propagation fixes this by storing pending updates at
internal nodes and pushing them down only when needed.
The recursive segment tree is cleaner here. Each node carries a
lazy value: "I owe my subtree this much, but I haven't
pushed it yet". On any visit to a node, first push down any
pending lazy to its children, then proceed.
// Recursive segment tree with lazy propagation.
// Supports range add + range sum on [0, n-1] inclusive.
type LazySeg struct {
n int
sum, lazy []int
}
func NewLazySeg(n int) *LazySeg {
return &LazySeg{n: n, sum: make([]int, 4*n), lazy: make([]int, 4*n)}
}
// Push pending lazy at node down to its two children.
func (s *LazySeg) push(node, l, r int) {
if s.lazy[node] == 0 {
return
}
mid := (l + r) / 2
left, right := 2*node, 2*node+1
s.sum[left] += s.lazy[node] * (mid - l + 1)
s.lazy[left] += s.lazy[node]
s.sum[right] += s.lazy[node] * (r - mid)
s.lazy[right] += s.lazy[node]
s.lazy[node] = 0
}
// Range update: add v to every a[i] for i in [ql, qr].
func (s *LazySeg) Update(node, l, r, ql, qr, v int) {
if qr < l || r < ql {
return // no overlap
}
if ql <= l && r <= qr {
s.sum[node] += v * (r - l + 1)
s.lazy[node] += v
return // full overlap — defer to lazy
}
s.push(node, l, r) // partial overlap — must descend
mid := (l + r) / 2
s.Update(2*node, l, mid, ql, qr, v)
s.Update(2*node+1, mid+1, r, ql, qr, v)
s.sum[node] = s.sum[2*node] + s.sum[2*node+1]
}
// Range query: sum of a[ql..qr].
func (s *LazySeg) Query(node, l, r, ql, qr int) int {
if qr < l || r < ql {
return 0
}
if ql <= l && r <= qr {
return s.sum[node]
}
s.push(node, l, r)
mid := (l + r) / 2
return s.Query(2*node, l, mid, ql, qr) + s.Query(2*node+1, mid+1, r, ql, qr)
}
// Public wrappers: Update(ql, qr, v) and Query(ql, qr) call into node=1, [0, n-1].Lazy propagation generalises beyond range-add + range-sum. The ingredients are: (1) a way to compose two pending updates into one (e.g. add v₁ then add v₂ is add v₁ + v₂), and (2) a way to apply a pending update to an aggregate without expanding to leaves (e.g. add v to a length-k range adds v·k to its sum). Find both, and lazy propagation handles range-assign, range-multiply, range-flip (XOR), or any combination.
6 · Fenwick tree (BIT)
When the aggregate is sum (or any invertible operation — XOR works too)
and updates are point-only, the Fenwick tree is the right tool. Shorter
code, smaller memory footprint (just n+1 ints, not
4n), better cache behaviour, and a smaller constant factor
that often matters on tight time limits.
The trick is the lowbit. For any positive index i,
i & -i isolates the lowest set bit. The Fenwick array
stores partial sums indexed by these lowbit-aligned ranges; updates and
queries walk by adding or subtracting the lowbit, climbing or descending
through a logarithmic number of indices.
// Fenwick tree (Binary Indexed Tree). 1-INDEXED — bit[0] is unused.
// Supports point-update and prefix-sum query in O(log n).
type BIT struct {
n int
bit []int
}
func NewBIT(n int) *BIT {
return &BIT{n: n, bit: make([]int, n+1)}
}
// Add v to position i (1-indexed).
func (b *BIT) Update(i, v int) {
for ; i <= b.n; i += i & -i { // climb: jump by lowbit
b.bit[i] += v
}
}
// Prefix sum of positions [1..i].
func (b *BIT) Prefix(i int) int {
s := 0
for ; i > 0; i -= i & -i { // descend: strip lowbit
s += b.bit[i]
}
return s
}
// Range sum on [l, r], inclusive.
func (b *BIT) Range(l, r int) int {
return b.Prefix(r) - b.Prefix(l-1)
}Three loops, total. The whole structure fits in 20 lines. Compared to the ~80 lines for a recursive lazy segment tree, it's no contest if your problem only needs prefix-sum-style queries.
lowbit(0) = 0 would cause infinite loops. Always
shift your input by one — store position p at index
p + 1. Forgetting this gives correct-looking code that
silently returns garbage for index 0.Two extensions are worth knowing. Range update, point query
via a BIT on the difference array: update(l, +v)
and update(r+1, −v) followed by prefix(i)
gives a[i] after all range additions. Range update,
range query requires two BITs and a small algebraic trick (see
the further-reading section).
7 · Worked example 1 — LC 307 Range Sum Query Mutable (easy)
Problem. Given an integer array nums,
support two operations: update(i, v) sets
nums[i] = v, and sumRange(l, r) returns the
sum of nums[l..r] inclusive.
This is the textbook BIT problem. The aggregate is sum, updates are point-only, no range updates. BIT in 20 lines beats segment tree in 80 lines.
Steps
- Build a BIT of size n. Initialise each position by calling
Update(i, nums[i]); total cost O(n log n). (Or use the linear-time build by computing partial sums in place.) - Maintain the original array. An
updatecall sets a value, but BIT stores deltas. Storenums[i]separately so you can computedelta = newValue − nums[i]and apply that. - sumRange uses
prefix(r+1) − prefix(l), accounting for the 1-indexed shift.
type NumArray struct {
bit *BIT
nums []int
}
func Constructor(nums []int) NumArray {
n := len(nums)
arr := NumArray{bit: NewBIT(n), nums: append([]int(nil), nums...)}
for i, v := range nums {
arr.bit.Update(i+1, v) // 1-indexed shift
}
return arr
}
func (a *NumArray) Update(i, v int) {
delta := v - a.nums[i]
a.nums[i] = v
a.bit.Update(i+1, delta)
}
func (a *NumArray) SumRange(l, r int) int {
return a.bit.Range(l+1, r+1)
}Build O(n log n), each query and update O(log n). Q operations total: O((n + q) log n). Comfortable for n, q = 10⁵.
8 · Worked example 2 — LC 315 Count Smaller After Self (medium)
Problem. Given an integer array nums,
return an array counts where counts[i] is the
number of elements nums[j] with j > i and
nums[j] < nums[i].
The classic BIT-over-coordinate-compression problem. Walk the array
right-to-left; for each element x, the answer is "how many
already-seen values are strictly less than x?". A BIT
indexed by rank gives prefix counts in O(log n). The trick is
compressing the value space first because raw values can be up to 10⁴
or live in any range.
Steps
- Coordinate-compress the values. Sort unique values; map each value to its rank (1..k). Now the BIT only needs
k+1slots, wherekis the number of distinct values. - Walk right-to-left. For each
xat positioni: look up its rankr, answercounts[i] = prefix(r − 1)(count of strictly smaller already seen), thenUpdate(r, +1). - Return
counts. Total time O(n log n) including the sort for compression.
import "sort"
func countSmaller(nums []int) []int {
n := len(nums)
// STEP 1 — coordinate-compress
sorted := append([]int(nil), nums...)
sort.Ints(sorted)
sorted = uniq(sorted) // remove duplicates
rank := func(x int) int {
// 1-indexed rank for BIT
return sort.SearchInts(sorted, x) + 1
}
// STEP 2 — BIT walk right-to-left
bit := NewBIT(len(sorted))
out := make([]int, n)
for i := n - 1; i >= 0; i-- {
r := rank(nums[i])
out[i] = bit.Prefix(r - 1) // strictly smaller seen so far
bit.Update(r, 1)
}
return out
}
func uniq(a []int) []int {
out := a[:0]
for i, v := range a {
if i == 0 || a[i-1] != v {
out = append(out, v)
}
}
return out
}9 · Worked example 3 — LC 218 Skyline Problem (hard)
Problem. Given a list of buildings as
(left, right, height) triples, return the silhouette of
the skyline as a list of (x, height) "key points" where
the visible height changes.
The classic sweep-line problem. The segment-tree solution: coordinate-
compress the x positions, build a segment tree over them keyed by max
height, and for each building do a range-update with
chmax(building.height) on its [left, right − 1]
span. After processing all buildings, walk the segment tree's leaves in
order; any height change emits a key point.
Steps
- Collect all distinct x coordinates. For each building, add
leftandright. Sort and deduplicate; this becomes the index space for the segment tree. - Build a segment tree of size m (number of distinct x's) over the intervals between consecutive x's. Each leaf
irepresents the gap[x[i], x[i+1]). Value stored: current max height covering that gap. - For each building (l, r, h): find the leaf indices for
landr, then range-update[l_idx, r_idx − 1]withchmax(h).chmax(v)means "set each element to max(element, v)" — a monotone operation, which lets you skip subtrees whose stored max already meetsv. - Walk leaves in order. Emit a key point at
x[i]wheneverheight[i] ≠ height[i − 1].
An alternative answer to LC 218 — and the one most people write — uses a sweep-line with a max-heap and lazy deletion. Walk events (left = +height, right = −height) sorted by x; maintain the current max-height; emit a key point whenever the max changes. O((n + q) log n) with much shorter code. The segment-tree framing is cleaner for generalisations like "skyline with deletions" or "max height in a range query while processing".
10 · Sister algorithms
Segment tree and BIT cover most of the interview surface. The neighbours are worth knowing by name in case the problem demands a heavier tool.
| Structure | Query | Update | When to reach for it |
|---|---|---|---|
| Sqrt decomposition | O(√n) | O(√n) or O(1) | Simpler to write than segtree; good for "Mo's algorithm" offline queries. |
| Sparse table | O(1) for idempotent (min/max/gcd) | not supported | Static array, no updates. Build O(n log n), query O(1). |
| Persistent segment tree | O(log n) | O(log n), creates new version | Offline k-th smallest in a range (Wavelet alternative). Each update returns a new "version" without mutating the old. |
| Merge sort tree | O(log² n) | not supported | "How many elements in [l, r] are ≤ k?" — sorted lists at every node, binary search inside. |
| Wavelet tree | O(log σ) | not supported | K-th smallest in a range, rank/count of a value in a range. σ is the alphabet size. |
| 2D segment tree / 2D BIT | O(log² n) | O(log² n) | Rectangular range queries with point updates. The constant factor is rough; profile before adopting. |
| Segment tree beats (Ji Driver) | O(log² n) amortised | chmin / chmax | Range-min-update / range-max-update with sum query. Niche but appears in CF problems. |
| Treap / implicit treap | O(log n) | insert/delete anywhere O(log n) | When the array itself changes shape — splice, reverse a subarray, insert mid-array. |
11 · How to solve hard problems step-by-step
A meta-process for any "range query + update" problem. Five steps; follow them in order.
- Identify the aggregate. Sum, min, max, GCD, XOR, count, product. Write
combine(a, b)in one line. Ifcombineisn't associative, you can't use a segment tree directly — re-frame or pick a different structure. - Identify the update type. Point update (set or add to one position), range update (add v to every element in [l, r]), or range assign (overwrite every element in [l, r] with v). Range updates require lazy propagation.
- Pick the structure. Sum + point-update → BIT, no question. Anything else + point-update → iterative segment tree. Range update → recursive segtree with lazy. Static array → prefix sums or sparse table.
- For range updates, decide lazy vs difference array. If updates are "range add, query prefix-sum at the end", a single BIT over the difference array is enough. If updates and queries interleave, lazy segtree.
- Code carefully. 1-indexed BIT; power-of-2 array sizes for the iterative segtree;
4nslots for the recursive segtree. Pick a convention for inclusive/exclusive ranges and stick with it. Write the neutral element comment at the top of the file so you don't second-guess it later.
nums[i] up to 10⁹) but the array length is small
(n ≤ 10⁵), coordinate-compress before building the BIT or
segtree. Sort unique values, map each value to its rank, work in
rank-space. This is the difference between a 10⁵-slot structure and a
10⁹-slot impossibility.12 · Complexity
| Variant | Build | Query | Update | Space |
|---|---|---|---|---|
| Iterative segtree (point upd, range query) | O(n) | O(log n) | O(log n) | O(2n) |
| Recursive segtree (lazy, range upd, range query) | O(n) | O(log n) | O(log n) | O(4n) |
| Fenwick tree (BIT, point upd, prefix sum) | O(n log n) | O(log n) | O(log n) | O(n) |
| BIT with difference array (range upd, point query) | O(n log n) | O(log n) | O(log n) | O(n) |
| Two-BIT (range upd, range query) | O(n log n) | O(log n) | O(log n) | O(2n) |
| 2D segment tree | O(nm) | O(log n · log m) | O(log n · log m) | O(4nm) |
| Persistent segtree | O(n log n) | O(log n) | O(log n) per version | O((n + q) log n) |
| Sqrt decomposition | O(n) | O(√n) | O(√n) or O(1) | O(n + √n) |
A word on constants. BIT's constant is roughly half that of an iterative segment tree because each loop iteration is one bit shift + one array access. The recursive lazy segtree has a 3–4× constant over BIT due to recursion overhead and the push-down. For n = 10⁵, all three fit comfortably in a 1-second time limit; for n = 10⁶ the choice starts to matter.
13 · What breaks
- Off-by-one with 1-indexed BIT. Forgetting to shift by 1 on input or on prefix queries. Symptom: index-0 element is always wrong. Fix: always shift, always.
- Lazy propagation, missing push-down. Querying a node without first pushing its pending lazy returns stale values. Symptom: two updates in a row, second one's effect missing. Fix:
push()as the very first line of any descent. - Integer overflow on aggregates. Sum of 10⁵ values each up to 10⁹ overflows int32. Use int64 (Go's
inton 64-bit is fine; explicitlyint64in Java / C++). - Recursive stack depth. For
n = 10⁶, the recursion tree is ~20 deep — fine. But naive "build by recursion" withbuild(1, 0, n − 1)uses O(n) stack for unbalanced trees; switch to iterative build or increase the stack. - Wrong neutral element. The "no overlap" case returns the neutral element of
combine. 0 for sum,math.MaxIntfor min,math.MinIntfor max, 0 for XOR, 0 for GCD (because gcd(0, x) = x), 1 for product. Get this wrong and queries silently return garbage on edge ranges. - Mixing inclusive and half-open ranges. Iterative segtree queries are
[l, r); problem statements usually say[l, r]. Always convert at the boundary, not in the middle of the loop. - Wrong segtree size. Recursive segtree needs
4nslots, not2n. The factor-of-4 accommodates non-power-of-twonand the over-allocation of the leftmost subtree. Allocating2nand indexing out of bounds is the canonical "passes small tests, crashes on large" bug. - Coordinate-compression collisions. If two different problem values compress to the same rank by accident, your counts are off.
sort.SearchIntson a deduplicated array is the safe lookup;map[int]intbuilt from a sorted unique slice works too. - BIT used for non-invertible aggregates. BIT supports prefix queries that subtract; it works for sum and XOR. It does not work for min/max — you can't subtract a min. For min/max range queries, use a segment tree or sparse table.
14 · Where this gets asked
| Company | Common framing | Why it fits |
|---|---|---|
| Count of Smaller Numbers (LC 315), Range Sum Query Mutable (LC 307), Reverse Pairs (LC 493) | BIT-over-compression is a senior interview lever. Often paired with merge-sort solutions to test recognition. | |
| Meta | Range Module (LC 715), Count of Range Sum (LC 327), My Calendar III (LC 732) | Booking / scheduling systems are range-update + range-query at scale. Calendar problems are a Meta onsite staple. |
| Amazon | Skyline (LC 218), Falling Squares (LC 699), Rectangle Area (LC 850) | Geometric sweep-line + segment tree, calibrated to packing / warehouse layout problems. |
| Microsoft | Range Sum (LC 307), Number of Longest Increasing Subsequence (LC 673) | Bounds-checking + LIS variants where segment tree replaces an O(n²) DP. |
| Stripe / Bloomberg / Two Sigma | Order-book queries, event-log aggregations, sliding-window percentiles | Financial pipelines do range aggregates on event streams — BIT or segment tree on coordinate-compressed timestamps is the standard structure. |
15 · Further reading
- CP Algorithms — Segment Tree. The reference. Iterative, recursive, lazy, 2D, persistent — all covered with code.
cp-algorithms.com/data_structures/segment_tree.html. - CP Algorithms — Fenwick Tree. Same site, the canonical BIT writeup including range-update + range-query with two BITs.
cp-algorithms.com/data_structures/fenwick.html. - CLRS — Chapter 14 (Augmenting Data Structures). The theory of "store extra info at internal nodes" that segment tree is a special case of.
- Codeforces blog — "Efficient and easy segment trees" by AlBert Rocha. The 20-line iterative segment tree that the implementation here is adapted from.
- Codeforces blog — Segment Tree Beats. For chmin/chmax range updates with sum queries. The amortised analysis is beautiful.
- Adjacent: binary search. Binary search on the segment tree — "find the leftmost position with prefix-sum ≥ k" — is one of the highest-use tricks in the toolkit.
- Adjacent: back to patterns index.