Heap / priority queue
A binary heap is the structure that turns "find the largest" into O(n log k).
Top-K elements, k-way merge, running median, scheduling by priority: most of
these collapse to a size-K heap plus a single pass. Python's heapq
is a min-heap; for a max-heap, negate the values on the way in and out.
1 · Why heap matters: the intuition
A binary heap gives you O(1) peek at the minimum (or maximum) and O(log n) insert and delete. That sounds modest until you compare against the alternatives. A sorted array gives O(1) peek too, but insertion is O(n), since every new element shifts up to n others. A balanced BST gives O(log n) everywhere but pays in pointer overhead and cache misses. The heap is the structure you pick when you care only about the extreme, and you do many inserts and pops.
The trick that makes heaps practical: they store an implicit binary
tree as a flat array. No pointers, no node allocations. The parent
of index i is at (i - 1) / 2; children are
at 2i + 1 and 2i + 2. That's why
container/heap in Go is a small interface, and
heapq in Python is a 200-line file. The index math does
all the work, and the array gives perfect cache locality.
The invariant is local: every parent is ≤ both children (for a min-heap). Sibling order is unspecified; there is no global sort. So finding an arbitrary value is O(n); there's no key to guide a search. The heap is exactly what you need for "smallest so far", and exactly what you don't need for "find this specific value".
2 · How to recognise it
A heap is usually the right pattern when any of the following show up. The first two are the strongest signals; the rest support. Each row is a sub-pattern worth recognising on sight.
- "Top K" or "K largest / smallest / closest". LC 215 Kth Largest in an Array, LC 347 Top K Frequent Elements, LC 973 K Closest Points. Maintain a size-K heap; root is the answer. Beats full sorting by a factor of
log n / log k, which is enormous when k is small. - Merge K sorted lists / files / streams. LC 23 Merge K Sorted Lists is the canon. Heap of
(value, source_index, position)tuples. Pop the smallest, push the next from that source. The same trick underlies LSM-tree compaction in databases. - Median streaming. LC 295 Find Median from Data Stream. The two-heap pattern: max-heap on the lower half, min-heap on the upper half. O(log n) per insert, O(1) per query. The interview question that lands in 90% of streaming-systems rounds.
- Scheduling / interval merge. LC 253 Meeting Rooms II ("minimum rooms"). Sort intervals by start; min-heap keyed by end time. Pop expired meetings, push the new one. Same shape underlies real-time task schedulers.
- Dijkstra and Prim: priority queue is a heap with decrease-key. Standard binary heap supports decrease-key in O(log n); Fibonacci heap makes it O(1) amortised but is slower in practice on the typical graph sizes that matter.
- Sliding window maximum. LC 239. A monotonic deque is faster (O(n) total), but a heap works too: push everything, lazy-delete stale entries when they surface at the top.
- "What comes next by priority" / discrete-event simulators. Heap keyed by event time. Anything where you process events out of insertion order but in priority order.
3 · Sister algorithms
The binary heap is the workhorse. Other heap variants exist; most are theoretical curiosities, but knowing them by name signals depth in an interview.
| Variant | What it's optimised for | Reality check |
|---|---|---|
| Binary heap | The default. O(log n) push and pop, O(1) peek. | What Go's container/heap and Python's heapq implement. |
| Fibonacci heap | Theoretical O(1) amortised decrease-key. Improves Dijkstra to O(E + V log V). | Rarely implemented — high constant factors, cache-hostile pointer structure. Boost's implementation is one of the few production uses. |
| Pairing heap | Simpler than Fibonacci, similar bounds in practice. Used in some research code. | Beats binary heap on graph algorithms with heavy decrease-key, but rarely worth the complexity. |
| Binomial heap | O(log n) merge of two heaps. Useful when you need to combine heaps frequently. | Rare in practice — most workloads don't need the merge primitive. |
| Skew heap | Self-adjusting heap with simple recursive merge. O(log n) amortised. | An elegant teaching example; almost never used in production. |
| d-ary heap | Each node has d children instead of 2. Shorter trees, more comparisons per level. | Sometimes faster for decrease-key-heavy workloads (d=4 is a common choice). |
4 · Canonical example: Kth Largest Element (LC 215)
Problem. Given an integer array nums and an
integer k, return the k-th largest element. Note this is the
k-th largest in sorted order, not the k-th distinct element.
Input: nums = [3, 2, 1, 5, 6, 4], k = 2
Output: 5
Explanation: sorted descending is [6, 5, 4, 3, 2, 1]; the 2nd largest is 5.Maintain a min-heap of size K. Walk the array; push each element, then pop if the heap exceeds K. The smallest of the K largest you've seen sits on top, so after one pass the heap holds the K largest elements and the root is the K-th largest. Each push / pop is O(log k); the whole thing is O(n log k) time and O(k) space.
5 · Reference implementation
Three versions: Go (the interview default) and Python (faster to read). The size-K heap is the one to reach for when k is much smaller than n. The full-heap version is a tidy fit when k approaches n.
Go: using container/heap
import "container/heap"
// IntHeap implements heap.Interface as a MIN-heap of ints.
// For max-heap, invert Less to return h[i] > h[j].
type IntHeap []int
func (h IntHeap) Len() int { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *IntHeap) Push(x any) { *h = append(*h, x.(int)) }
func (h *IntHeap) Pop() any {
old := *h
n := len(old)
x := old[n-1] // last element, set by heap package
*h = old[:n-1]
return x
}
func findKthLargest(nums []int, k int) int {
h := &IntHeap{}
heap.Init(h)
for _, x := range nums {
heap.Push(h, x)
if h.Len() > k {
heap.Pop(h) // drop the smallest
}
}
return (*h)[0] // root is the K-th largest
}Python: using heapq
import heapq
def kth_largest_sizek(nums: list[int], k: int) -> int:
# Size-K min-heap. O(n log k) time, O(k) space.
heap: list[int] = []
for x in nums:
heapq.heappush(heap, x)
if len(heap) > k:
heapq.heappop(heap) # drop the smallest
return heap[0] # root is the K-th largest
def kth_largest_fullheap(nums: list[int], k: int) -> int:
# Build a max-heap by negating, then pop K times.
# O(n) to heapify + O(k log n) to pop. Wins when k is large.
neg = [-x for x in nums]
heapq.heapify(neg) # O(n)
for _ in range(k - 1):
heapq.heappop(neg)
return -neg[0]6 · Variations
What you push, what you compare, and how many heaps you keep changes by problem type. Six common variations:
| Variation | What changes | Example |
|---|---|---|
| Min vs max heap | Python's heapq is min-only. For a max-heap, push -x and negate on the way out, or wrap items in a comparator class. | LC 1046 Last Stone Weight |
| Top-K with size-K heap | Push, then pop if the heap exceeds K. Root holds the K-th best across the whole input. | LC 215 Kth Largest, LC 347 Top K Frequent |
| K-way merge | Heap of (value, source-index, iterator) tuples. Pop the smallest, push the next element from that source. | LC 23 Merge K Sorted Lists |
| Running median (two heaps) | Max-heap on the lower half, min-heap on the upper half. Rebalance after each insert so sizes differ by at most one. | LC 295 Find Median from Data Stream |
| Lazy deletion | Don't remove elements directly. Mark them stale and skip on pop. Useful when the underlying API doesn't support arbitrary deletion. | LC 1845 Seat Reservation Manager |
| Indexed heap | Hash map from element to heap index, updated on every sift. Supports decrease-key in O(log n) — the Dijkstra optimisation. | Dijkstra, Prim's MST |
7 · Worked example: Find Median from Data Stream (LC 295)
Problem. Build a data structure with two operations:
addNum(x) ingests a number, findMedian()
returns the median of all numbers seen so far. Both should be fast.
The two-heap pattern. Keep a max-heap for the lower half of the stream and a min-heap for the upper half. Maintain the invariant that their sizes differ by at most one. The median is then either the top of the larger heap (odd-sized stream) or the average of the two tops (even-sized).
import "container/heap"
// MaxHeap — invert Less for max-heap behaviour
type MaxHeap []int
func (h MaxHeap) Len() int { return len(h) }
func (h MaxHeap) Less(i, j int) bool { return h[i] > h[j] } // > for max
func (h MaxHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *MaxHeap) Push(x any) { *h = append(*h, x.(int)) }
func (h *MaxHeap) Pop() any { old := *h; n := len(old); x := old[n-1]; *h = old[:n-1]; return x }
// MinHeap
type MinHeap []int
func (h MinHeap) Len() int { return len(h) }
func (h MinHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h MinHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *MinHeap) Push(x any) { *h = append(*h, x.(int)) }
func (h *MinHeap) Pop() any { old := *h; n := len(old); x := old[n-1]; *h = old[:n-1]; return x }
type MedianFinder struct {
lo *MaxHeap // lower half — max on top
hi *MinHeap // upper half — min on top
}
func Constructor() MedianFinder {
return MedianFinder{lo: &MaxHeap{}, hi: &MinHeap{}}
}
func (m *MedianFinder) AddNum(num int) {
// 1. Push into lo (max-heap), then move its top to hi (min-heap)
heap.Push(m.lo, num)
heap.Push(m.hi, heap.Pop(m.lo))
// 2. Rebalance — lo must be >= hi in size (we hold extra on lo when odd)
if m.hi.Len() > m.lo.Len() {
heap.Push(m.lo, heap.Pop(m.hi))
}
}
func (m *MedianFinder) FindMedian() float64 {
if m.lo.Len() > m.hi.Len() {
return float64((*m.lo)[0])
}
return (float64((*m.lo)[0]) + float64((*m.hi)[0])) / 2
}8 · Worked example: Merge K Sorted Lists (LC 23)
Problem. Given k sorted linked lists,
merge them into one sorted list. Total elements is N;
naive concatenation-and-sort is O(N log N), but the structure of the
input lets you do O(N log k).
Seed the heap with the head of each list. Pop the smallest, append to the output, push that list's next node. Each push/pop is O(log k); N elements total, so O(N log k) overall.
import "container/heap"
type ListNode struct {
Val int
Next *ListNode
}
// NodeHeap holds (value, sequence, node) so equal values still order.
type NodeItem struct {
node *ListNode
seq int
}
type NodeHeap []NodeItem
func (h NodeHeap) Len() int { return len(h) }
func (h NodeHeap) Less(i, j int) bool {
if h[i].node.Val != h[j].node.Val {
return h[i].node.Val < h[j].node.Val
}
return h[i].seq < h[j].seq // tiebreaker — keep sort stable
}
func (h NodeHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *NodeHeap) Push(x any) { *h = append(*h, x.(NodeItem)) }
func (h *NodeHeap) Pop() any { old := *h; n := len(old); x := old[n-1]; *h = old[:n-1]; return x }
func mergeKLists(lists []*ListNode) *ListNode {
h := &NodeHeap{}
heap.Init(h)
seq := 0
for _, head := range lists {
if head != nil {
heap.Push(h, NodeItem{node: head, seq: seq})
seq++
}
}
dummy := &ListNode{}
tail := dummy
for h.Len() > 0 {
item := heap.Pop(h).(NodeItem)
tail.Next = item.node
tail = tail.Next
if item.node.Next != nil {
heap.Push(h, NodeItem{node: item.node.Next, seq: seq})
seq++
}
}
return dummy.Next
}Val, Go's heap would try to compare the
*ListNode pointers, which works (pointer comparison)
but is implementation-defined and brittle. The explicit
seq field forces a deterministic, stable ordering and
makes the code obvious to read.9 · Hard worked example: IPO (LC 502)
Problem. You start with capital w. There
are n projects, each with a profit p[i] and
a capital requirement c[i]. You can do at most k
projects. Each completed project adds its profit to your capital.
Maximise your final capital.
Insight. At any moment, you should pick the most profitable project you can afford. As your capital grows, new projects become affordable. The right structure: two heaps. A min-heap by capital requirement (so you can cheaply find what's just become affordable), and a max-heap by profit (so you can pick the best from what's affordable).
import (
"container/heap"
"sort"
)
// Min-heap by capital requirement
type CapHeap [][2]int // each entry: [capitalRequired, profit]
func (h CapHeap) Len() int { return len(h) }
func (h CapHeap) Less(i, j int) bool { return h[i][0] < h[j][0] }
func (h CapHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *CapHeap) Push(x any) { *h = append(*h, x.([2]int)) }
func (h *CapHeap) Pop() any { old := *h; n := len(old); x := old[n-1]; *h = old[:n-1]; return x }
// Max-heap by profit
type ProfitHeap []int
func (h ProfitHeap) Len() int { return len(h) }
func (h ProfitHeap) Less(i, j int) bool { return h[i] > h[j] } // > for max
func (h ProfitHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *ProfitHeap) Push(x any) { *h = append(*h, x.(int)) }
func (h *ProfitHeap) Pop() any { old := *h; n := len(old); x := old[n-1]; *h = old[:n-1]; return x }
func findMaximizedCapital(k, w int, profits, capital []int) int {
n := len(profits)
// Sort projects by capital required, ascending — feeds into the cap-heap
idx := make([]int, n)
for i := range idx {
idx[i] = i
}
sort.Slice(idx, func(i, j int) bool {
return capital[idx[i]] < capital[idx[j]]
})
capH := &CapHeap{}
profH := &ProfitHeap{}
heap.Init(capH)
heap.Init(profH)
next := 0 // pointer into the sorted-by-capital index
for i := 0; i < k; i++ {
// Move every now-affordable project into the profit-heap
for next < n && capital[idx[next]] <= w {
heap.Push(profH, profits[idx[next]])
next++
}
if profH.Len() == 0 {
break // no project affordable — done
}
w += heap.Pop(profH).(int)
}
return w
}10 · How to solve a hard heap problem, step by step
Five questions that turn a confusing prompt into a heap-shaped solution.
- What does the heap hold? A raw value? A tuple of
(priority, payload)? A struct with multiple fields? The answer constrains the comparator and the Less function. Always start here; most heap bugs are "I picked the wrong shape for the entry". - Min-heap or max-heap? "I want the smallest" → min-heap. "I want the largest" → max-heap (in Go, invert
Less; in Python, push negatives). "I want both" → two heaps, or a balanced BST. - When do you push and when do you pop? For top-K: push every element, pop when size exceeds K. For merge-K: push K initially, pop-then-push on each iteration. For two-heap median: push-pop-rebalance on every insert. The rhythm is problem-specific; write it out before coding.
- Two heaps? Then which goes where? The two-heap median is the canonical case. Lower half in a max-heap (so the top is the median candidate); upper half in a min-heap (so the top is the other median candidate). Rebalance after every insert so sizes differ by at most one. Generalise: one heap "in", one "out", invariant maintained between them.
- Lazy deletion when you can't update in place. If a priority changes (someone updates a task's deadline), the binary heap won't re-sift on external mutation. Two fixes: (a) push a new entry with the new priority and lazy-delete the old one when it surfaces at the top; (b) use an indexed heap with a hash map from element to position, manually sift. Pick (a) for interview code, (b) for production Dijkstra.
11 · Five-problem progression
| # | LC | Problem | Why it's here |
|---|---|---|---|
| 1 | LC 215 | Kth Largest Element in an Array | The minimal heap problem. Size-K min-heap; root is the answer. |
| 2 | LC 23 | Merge K Sorted Lists | K-way merge. Heap of list heads; pop the smallest, push its successor. |
| 3 | LC 347 | Top K Frequent Elements | Count first, then size-K heap on (frequency, value). Bucket sort is the O(n) alternative worth mentioning. |
| 4 | LC 295 | Find Median from Data Stream | The two-heap trick. Most-asked variation; expect a follow-up about a sliding window of size K. |
| 5 | LC 1851 | Minimum Interval to Include Each Query | Offline processing plus a heap keyed by interval size. The "sort queries, sweep intervals" idiom. |
12 · What breaks: the deep list
- Go's container/heap requires implementing the interface correctly. Five methods:
Len,Less,Swap,Push,Pop. The conventions are subtle.Pushmust append to the end (the heap package sifts up);Popmust remove the last element (the heap package put it there). Confusing the user-facingheap.Push(h, x)with the interface's(h *T) Push(x any)is a classic bug; always call the package function, never the method directly. - Less inverted for max-heap. For max-heap,
Less(i, j) bool { return h[i] > h[j] }. Forgetting the flip is the most common Go heap bug; the heap silently behaves as a min-heap. Test with a hand-traced example before trusting it. - Forgetting that Python's
heapqis min-only. Reaching for "max-heap" by name doesn't exist in the standard library. Negate the values, or wrap items in a class with__lt__reversed. - Stale entries not lazily deleted. If you update an entry's priority by pushing a new copy, the old entry stays in the heap. When it surfaces at the top, your
peek()returns garbage unless you check whether it's stale and pop+continue. The lazy-deletion idiom: keep aremovedset; on pop, if the top is in the set, pop again. - Tuple comparison ordering crashes on incomparable second elements. Go's heap.Less is your code, so it never crashes — but if you compare tuples by their first element and they tie, you need a deterministic tiebreaker. In Python, two equal-priority tuples cascade to comparing the second field; if it's a custom object without
__lt__, you get a TypeError. Push(priority, sequence, payload)with a monotonic counter. - The n-th-element interpretation off-by-one. "K-th largest" usually means 1-indexed: K=1 is the maximum, K=N is the minimum. Some problems use 0-indexed. Check the test cases, not the prose. The cost of being wrong is an off-by-one that passes most tests and fails one edge case.
- Confusing heap-of-size-K with full-heap. Size-K is O(n log k) and wins when k is small. Full-heap with
heapifyis O(n + k log n) and wins when k approaches n. State which you're using and why. - Forgetting the size-K invariant. Always push first, then pop if the heap exceeds K. Doing it the other way round drops elements that should have evicted the current root.
- Claiming the wrong complexity. "Top-K with a heap is O(n log n)" is wrong. The size-K bound is O(n log k). The distinction matters; interviewers tend to push on it.
- Mutating heap entries in place. Heaps don't re-sift on external mutation. If a priority changes, you either rebuild the heap or use lazy deletion plus re-insertion.
13 · Complexity, side by side
A heap looks like a sorted array but its complexity profile is sharply different. The table is worth memorising; it's the answer to "why heap and not BST or sorted array?"
| Operation | Heap | Sorted array | Balanced BST |
|---|---|---|---|
| Peek min/max | O(1) | O(1) | O(log n) |
| Push (insert) | O(log n) | O(n) | O(log n) |
| Pop min/max | O(log n) | O(n) (shift) | O(log n) |
| Heapify (build from n items) | O(n) | O(n log n) | O(n log n) |
| Search for arbitrary value | O(n) | O(log n) | O(log n) |
| Ordered iteration | O(n log n) (repeated pop) | O(n) | O(n) |
| Memory overhead | 0 — array, no pointers | 0 — flat array | ~3× — pointers, colors |
| Cache locality | Excellent — contiguous | Excellent — contiguous | Poor — heap-allocated nodes |
14 · The array-as-tree, made visual
A heap is a complete binary tree stored as a flat array. No pointers: child positions
are arithmetic on the index. This is what makes heapq a 200-line file
instead of a 2,000-line one.
The invariant is local: at every node, parent ≤ both children. Sibling order is unspecified; there's no global sort. This is why finding an arbitrary value is O(n); there's no comparison key to guide a search.
# sift-up — restore the invariant when a smaller value lands at index i
def sift_up(heap, i):
while i > 0:
parent = (i - 1) // 2
if heap[i] >= heap[parent]:
return
heap[i], heap[parent] = heap[parent], heap[i]
i = parent
# sift-down — restore the invariant from the root after a pop
def sift_down(heap, i, n):
while True:
l, r = 2*i + 1, 2*i + 2
smallest = i
if l < n and heap[l] < heap[smallest]: smallest = l
if r < n and heap[r] < heap[smallest]: smallest = r
if smallest == i:
return
heap[i], heap[smallest] = heap[smallest], heap[i]
i = smallest15 · Real-world heaps, in production code
Heaps are everywhere in production once you know where to look. Five places they earn their keep:
- Kernel run-queues (CFS, EEVDF).
- Linux's Completely Fair Scheduler stores runnable tasks in a red-black tree, but the EEVDF scheduler (Linux 6.6+) uses an explicit heap keyed by virtual deadline.
kernel/sched/eevdf.cis one of the cleanest production heap implementations to read. - Dijkstra's shortest path.
- Every graph routing library (networking stacks like OSPF link-state, Google Maps, route planners) uses a min-heap keyed by tentative distance. The decrease-key operation is O(log n) on a binary heap; Fibonacci heaps make it O(1) amortised but are slower in practice on the typical graph sizes that matter.
- Event loops and timers.
- Node.js, libuv, the JVM's
ScheduledThreadPoolExecutor, Go'stime.Timer: all use a min-heap keyed by deadline. Pop the earliest timer; if it's in the future, sleep until it; if it's due, fire and pop the next. The runtime's "next deadline" is always at heap[0]. - K-way merge for sorted streams.
- LSM-tree compaction merges N sorted SSTables into one. Each SSTable contributes its smallest unread key to a heap of size N; pop the global min, advance that table's pointer, push the new value. RocksDB, LevelDB, Cassandra, and ScyllaDB all do this. The throughput of compaction is bounded by the heap operation cost.
- Top-K aggregation in analytics.
- Spark's
top(), ClickHouse'stopK(), every "top 10 user agents" query: all run a partial sort with a heap of size K. Better than a full sort by a factor of n/K log n vs log K, which is enormous for streaming workloads.
heapq on a list of 1 million
integers: ~0.04 seconds for heapify (O(n), the build-heap trick), ~0.12
seconds for n pushes (O(n log n)). The factor-of-3 gap is why competitive
programmers always batch-build over incremental insertion when the data is available
upfront. Java's PriorityQueue is ~2× slower at the same operations because
of object boxing.16 · Three heap idioms worth knowing cold
Three small templates show up over and over. Memorise the shapes; whichever flavour of "find K" or "merge K" you get, one of these is almost always the answer.
A · Two heaps for running median
Max-heap on the lower half, min-heap on the upper half. After every insert, rebalance so the heaps differ by at most one in size. The median is the top of the larger heap (odd-sized stream) or the average of the two tops (even-sized).
import heapq
class MedianFinder:
def __init__(self):
self.lo = [] # max-heap (store negatives)
self.hi = [] # min-heap
def add_num(self, num):
# 1. push into max-heap, then move max to min-heap
heapq.heappush(self.lo, -num)
heapq.heappush(self.hi, -heapq.heappop(self.lo))
# 2. rebalance if min-heap is too big
if len(self.hi) > len(self.lo):
heapq.heappush(self.lo, -heapq.heappop(self.hi))
def find_median(self):
if len(self.lo) > len(self.hi):
return -self.lo[0]
return (-self.lo[0] + self.hi[0]) / 2The "push, move, rebalance" three-step is the whole trick. Per insert: three log-n operations. Per query: O(1).
B · K-way merge
Merge K sorted lists (or files, or streams) into one sorted output. Push the head of each list with its source index; pop the smallest, push the next element from that same source.
import heapq
def merge_k_sorted(lists: list[list[int]]) -> list[int]:
heap = []
# Seed with the first element from each non-empty list.
for i, lst in enumerate(lists):
if lst:
heapq.heappush(heap, (lst[0], i, 0)) # (value, source_idx, pos_in_source)
out = []
while heap:
val, src, pos = heapq.heappop(heap)
out.append(val)
if pos + 1 < len(lists[src]):
heapq.heappush(heap, (lists[src][pos + 1], src, pos + 1))
return outThe tuple's middle field (i) is a tiebreaker; without it, Python tries to compare list values (which works for ints) or hits a TypeError if the values aren't naturally ordered. Always carry a tiebreaker.
C · Custom comparator (the Java / C++ trick)
When the natural ordering of the value isn't what you want (e.g. order points by distance to the origin), wrap with a tuple in Python, or pass a comparator class in Java / C++.
# Python — closest K points to origin
def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
heap = [] # min-heap of (dist_squared, point)
for x, y in points:
d = x*x + y*y
heapq.heappush(heap, (d, [x, y]))
return [pt for _, pt in heapq.nsmallest(k, heap)]
# Java — custom comparator on a PriorityQueue
PriorityQueue<int[]> pq = new PriorityQueue<>(
(a, b) -> Integer.compare(a[0]*a[0] + a[1]*a[1], b[0]*b[0] + b[1]*b[1])
);Python ships heapq.nsmallest(k, iterable) and heapq.nlargest(k, iterable). Both internally use a size-K heap. For "top-K" problems, mention them by name; you'll save lines and impress the interviewer who's seen sorted()[:k] a hundred times.
17 · What to say in the interview
- "Top-K → heap of size K." Name the pattern out loud. The interviewer wants to hear the recognition.
- State min-heap vs max-heap explicitly. "I'll use a min-heap of size K, so the smallest of the K largest sits on top." Don't make them infer it.
- "O(n log k) time, O(k) space." Both. Most candidates state time and skip space.
- Mention Python's quirk. "
heapqis a min-heap, so for max-heap behaviour I'll negate on insert." A one-liner; shows you've actually written this code. - Edge cases to mention.
k = 0(return early or trivially empty),k > n(the whole array is the answer), duplicates (usually fine for top-K, may matter for "K distinct"), single-element input. - For the two-heap median, walk the rebalance rule. "Insert into the max-heap; move its top to the min-heap; if the min-heap is now larger, move its top back." Three lines; covers every case.
18 · Where this gets asked
Heap problems are over-represented in big-tech interviews. The same five framings recur across hundreds of problem reports:
| Company | Common framing | Why it fits their stack |
|---|---|---|
| Top-K Frequent Elements (LC 347), Kth Largest in Stream (LC 703), Find Median From Data Stream (LC 295) | Search-quality and ML teams use Top-K constantly — every ranking query lands here. | |
| Meta | Merge K Sorted Lists (LC 23), Reorganize String (LC 767), Task Scheduler (LC 621) | News-feed ranking is a K-way merge; the canonical interview question for that team is some variant of LC 23. |
| Amazon | K Closest Points to Origin (LC 973), Top K Frequent Words (LC 692), Reorganize Distance (LC 1102) | Logistics + last-mile routing both reduce to "closest N delivery points" or "top-K busy zones". |
| Microsoft | Sliding Window Median (LC 480), Find Median From Stream (LC 295), Smallest Range Covering K Lists (LC 632) | Office collab and Bing relevance both lean on streaming-median and K-way merge. |
| Bloomberg / Two Sigma / Citadel | Trapping Rain Water II (LC 407), Find Median in Sliding Window, K-th Smallest in a Sorted Matrix (LC 378) | Trading pipelines do priority dispatch by deadline + price; heap is the canonical structure. |
| Uber / Airbnb | Meeting Rooms II (LC 253), Process Tasks Using Servers (LC 1882) | Dispatching and matching engines pop the earliest available worker — min-heap by free-time. |
19 · Lazy deletion: the priority queue trick
Binary heaps don't support "delete arbitrary element" or "update priority" in O(log n). The classic workaround: lazy deletion. Mark entries stale instead of removing them, and skip stale entries when they surface at the top.
Two flavours in practice. (1) When you need to remove a known element: keep a counter or set of "removed" entries; on pop, if the top is in the removed set, pop again and continue. (2) When you need to update a priority: push the new (priority, element) without touching the old entry; on pop, ignore entries whose stored priority doesn't match the element's current one.
import "container/heap"
// Priority queue with lazy deletion — useful for sliding-window problems
// where the "expiring" entry is hard to find.
type Item struct {
value int
seq int // unique id for this insertion
}
type PQ []Item
func (p PQ) Len() int { return len(p) }
func (p PQ) Less(i, j int) bool { return p[i].value > p[j].value } // max-heap
func (p PQ) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func (p *PQ) Push(x any) { *p = append(*p, x.(Item)) }
func (p *PQ) Pop() any {
old := *p; n := len(old); x := old[n-1]; *p = old[:n-1]; return x
}
func slidingWindowMax(nums []int, k int) []int {
h := &PQ{}
heap.Init(h)
out := []int{}
for i, v := range nums {
heap.Push(h, Item{value: v, seq: i})
// Lazy-delete: pop entries whose seq is outside the current window
for h.Len() > 0 && (*h)[0].seq <= i-k {
heap.Pop(h)
}
if i >= k-1 {
out = append(out, (*h)[0].value)
}
}
return out
}20 · The indexed heap: Dijkstra's secret
Standard binary heap supports push, pop, and peek. It does NOT support "decrease the priority of an arbitrary element" in better than O(n) — you'd have to scan to find the element first. For Dijkstra and Prim, where the same node's distance gets updated multiple times, this is a bottleneck.
Two practical fixes. (1) Lazy reinsertion: just push the new (distance, node) pair and ignore stale entries on pop by comparing the popped distance against the node's current authoritative distance. Total work is O((V + E) log E), which is fine for most graphs. (2) Indexed heap: maintain a parallel hash map from node to its position in the heap array; update the position on every sift; this lets you implement true decrease-key in O(log n). The bookkeeping is fiddly but production Dijkstra implementations (e.g., Go's standard pathfinder libraries) use it.
// <a class="il" href="/simulators/dijkstra-pathfinding">Dijkstra</a> with LAZY reinsertion — the interview default
import "container/heap"
type Edge struct{ to, w int }
type State struct{ dist, node int }
type StateHeap []State
func (h StateHeap) Len() int { return len(h) }
func (h StateHeap) Less(i, j int) bool { return h[i].dist < h[j].dist }
func (h StateHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *StateHeap) Push(x any) { *h = append(*h, x.(State)) }
func (h *StateHeap) Pop() any {
old := *h; n := len(old); x := old[n-1]; *h = old[:n-1]; return x
}
func dijkstra(n, src int, graph [][]Edge) []int {
const INF = 1 << 30
dist := make([]int, n)
for i := range dist {
dist[i] = INF
}
dist[src] = 0
h := &StateHeap{{dist: 0, node: src}}
heap.Init(h)
for h.Len() > 0 {
cur := heap.Pop(h).(State)
if cur.dist > dist[cur.node] {
continue // STALE entry — skip
}
for _, e := range graph[cur.node] {
nd := cur.dist + e.w
if nd < dist[e.to] {
dist[e.to] = nd
heap.Push(h, State{dist: nd, node: e.to})
}
}
}
return dist
}if cur.dist > dist[cur.node] { continue }
is what makes lazy reinsertion correct. When we improved a node's
distance earlier, we pushed a new (better) entry without removing
the old one. The old one is still in the heap — but when it
surfaces, its dist field doesn't match the node's
current authoritative dist[] value, so we skip it.
Total work is O((V + E) log E) — slightly worse than the O((V + E)
log V) bound of a true indexed heap, but the constant factors and
implementation simplicity usually win in practice.21 · Why O(n) heapify works: the build-heap proof
It looks like building a heap by repeated heappush
should be O(n log n). It is. But heapify on an
existing array is O(n) — strictly faster. The proof is short and
worth knowing for the interviewer who pushes on complexity claims.
The standard heapify walks the array from index n/2 − 1
down to 0, calling sift-down on each. The cost of sifting down node
at depth d from the bottom is O(d). Summing across the
tree: half the nodes are leaves (cost 0), a quarter are at depth 1
(cost 1 each), an eighth at depth 2 (cost 2 each), and so on. Total:
Sum = (n/2)*0 + (n/4)*1 + (n/8)*2 + (n/16)*3 + ...
= n * sum_{k=1..inf} (k / 2^(k+1))
= n * 1
= O(n)The geometric series sum k/2k+1 converges
to 1, so total work is O(n). Heaps you build with heapify are
cheaper than heaps you build with repeated push by a factor of
log n — small but real.
heap.Init (Go) or
heapq.heapify (Python). When the data arrives
incrementally, you have no choice but to push one-by-one — that's
O(n log n) total. The factor-of-log n gap is exactly
why streaming top-K with a size-K heap is O(n log k), not O(n log n).22 · Try it yourself
Four problems, ordered by depth. Spend 30 minutes on each before checking the hint — the struggle is where the pattern internalises.
- Kth Largest Element · LC 215
- The canonical heap question. Min-heap of size K; push every element; pop when oversized; heap[0] is your answer. Hint: also doable in O(n) average with quickselect — worth mentioning out loud even if you implement the heap.
- K Closest Points to Origin · LC 973
- Same shape as Kth Largest but with a custom comparator (Euclidean distance, no square root needed). Hint: heapq with tuples in Python:
(distance_squared, point)sorts lexicographically and you never compute sqrt. - Merge K Sorted Lists · LC 23
- The K-way merge canon. Min-heap holding one node per list; pop the smallest, advance that list, push the next node. Hint: tuples need a tie-breaker — Python complains if two ListNodes compare equal but aren't orderable. Use
(node.val, index, node). - Find Median from Data Stream · LC 295
- The two-heap median. Max-heap for the lower half, min-heap for the upper half; rebalance after each insert so sizes differ by at most 1. Hint: median is heap[0] of the larger heap, or the average of the two roots if sizes are equal.
- Stretch — The Skyline Problem · LC 218
- One of the hardest heap-related questions. Sweep-line + max-heap; when a new building starts, push its (negative-height, end) onto the heap; pop expired entries lazily. Hint: the magic is the lazy deletion — never remove from the middle of a heap; just skip stale tops at the next pop.
Further reading
- CLRS — Chapter 6. The textbook treatment of binary heaps and heapsort. Worth reading once for the O(n)
build-heapproof. - Sedgewick & Wayne — Algorithms, Chapter 2.4. Priority queues with the indexed-heap variant used in Dijkstra and Prim.
- Adjacent: Tree DP. When the priority is computed over a subtree rather than over the input directly.
- Adjacent: Graph algorithms. Where Dijkstra and Prim live — both are heap-driven once you allow weighted edges.
- Semicolony: Heap simulator. Visualises push, pop, heapify, and the sift-up / sift-down operations. A few minutes here is usually faster than reading the chapter twice.
- Semicolony: Algorithm visualizer. Step through heapsort and top-K side by side; the size-K version is visibly cheaper as k shrinks.