Hash map / set
The pattern that turns a brute-force O(n²) into O(n) by paying constant extra memory per element. Hash maps and sets unlock four sub-patterns: counting (frequency tables), complement-lookup (Two Sum), dedupe (sets), and prefix-sum + map (subarray-sum-equals-K). One of these is in roughly one in three interview problems.
Try it: Two Sum, step by step
Drag through the canonical Two Sum execution. Watch the seen-map fill up as the pointer walks; the green flash is the moment the complement is found.
[3, 3] with target 6
match a single element with itself. Checking first means at iteration
i the seen-map only contains indices strictly less
than i — no self-match is possible.1 · Why hash maps matter
The hash map is the single most common reason a brute-force O(n²) collapses to O(n). The mechanism. Constant-time lookup by key. Average case. Done by hashing the key to a bucket index in a fixed-size table. The amortised analysis makes inserts O(1) even though occasional rehashes are O(n). The doubling-table pattern pays for itself geometrically.
- The space-time trade. O(n) extra memory buys O(n²) → O(n) time. Almost always worth it; only refuse the trade when the problem explicitly forbids extra space.
- The amortised reality. An individual
puton a hash map can be O(n): when the load factor crosses the threshold, the table doubles and every entry is rehashed. Over n inserts, the total work is still O(n), so each insert is amortised O(1). - When hashing breaks. Adversarial inputs can force every key into the same bucket, degrading lookups to O(n). Modern languages (Python ≥ 3.4, Java ≥ 8, Go) randomise the hash seed per process to make this hard. On unhardened implementations (or with predictable hash functions) the worst case is real and exploitable.
- The two ways the abstraction leaks. Iteration order (Go: deliberately randomised; Java HashMap: insertion order broken; Python ≥ 3.7: insertion order preserved by language guarantee). Mutating a key after insertion silently corrupts the map in most languages.
2 · How to recognise it: five distinct shapes
Hash map problems take a small number of recurring shapes. Naming them helps the trigger fire faster in interviews.
Shape A: count-and-check
Build a frequency map; ask boolean / count questions of it. "Are these anagrams" (compare two maps). "First unique character" (find the first key with count 1). "Top K frequent" (sort or heap on the map's values). The defining shape of counting.
Shape B: complement lookup
Walking the input, for each x, ask whether a specific
complement is already in the map. Two Sum (target − x).
"Pair with difference K" (x − K and x + K).
Contains Duplicate (x itself). The defining shape of
one-pass pair search.
Shape C: group by signature key
Derive a canonical form for each input element; use it as a map key to bucket equivalent elements. Group Anagrams (sorted-string or letter-count signature). Word Pattern (bijective mapping signature). Points on a Line (slope from origin). The defining shape of equivalence classes.
Shape D: first-occurrence index
Store value → index. Used when you need the position, not just the fact. Longest substring without repeat (record where each char was last seen; jump the left pointer past it). Two Sum (return the indices, not the values). Detect "second visit" patterns where the distance between visits matters.
Shape E: rolling pre-state
Maintain a running aggregate (prefix sum, prefix XOR, running hash);
use the map to count earlier states matching some predicate against
the current state. Subarray Sum Equals K (running prefix; look up
prefix − k). Subarray sum divisible by K (running prefix
mod K). Continuous Subarray Sum (LC 523). The defining shape of
"a subarray ending here is good iff some earlier prefix has
property X".
3 · Canonical example: Two Sum (LC 1)
Problem. Given an unsorted array and a target, return the indices of two numbers that sum to the target. Each input has exactly one solution; you can't use the same element twice.
Input: nums = [2, 7, 11, 15], target = 9
Output: [0, 1]
Explanation: nums[0] + nums[1] == 9.Why hash map
- Brute force: O(n²) over all pairs.
- Sort + two pointers: O(n log n), but the indices change after sorting, so you'd need to track originals. Doable but awkward.
- Hash map: O(n) time, O(n) space. The single-pass complement-lookup.
Why Two Sum is the canonical hash-map question
Two Sum is the first hash-map question every candidate sees because it isolates the pattern in its purest form: one map, one pass, one complement lookup. Every other hash-map shape is an elaboration on this skeleton: group-by replaces "lookup of one value" with "bucket into many"; prefix-sum replaces "x" with "running aggregate"; canonical-key replaces "x" with "signature(x)". Get this one in your fingers and the rest fall out.
Build the intuition: three questions
- What does
seenstore? Value → index, for every element we've already scanned past. Why both, why not just the values? - For each new element
x, what do you look up? The complement: the value that, paired withx, hits the target. What's its formula? - Check, then insert: why that order? What does
[3, 3]with target 6 do if you flip it?
Go scaffold
func twoSum(nums []int, target int) []int {
seen := make(map[int]int) // value -> index
for i, x := range nums {
// TODO: compute the complement
// TODO: check if complement is in seen — if yes, return the pair
// TODO: otherwise, record the current value/index
}
return nil // problem guarantees a solution; this is unreachable
}i, the seen-map only contains entries strictly before
i. So the complement (if found) is at a different index:
no risk of pairing an element with itself. Flipping the order
(insert first, check second) breaks this on inputs with the same value
twice.[3, 3] with target 6 would match
the same element with itself. Check, then insert.Reference solution: the complete one-pass
func twoSum(nums []int, target int) []int {
seen := make(map[int]int) // value -> index
for i, x := range nums {
complement := target - x
if j, ok := seen[complement]; ok {
return []int{j, i} // earlier index first, by convention
}
seen[x] = i
}
return nil // problem guarantees a solution
}Trace through nums = [2, 7, 11, 15], target = 9
| i | x | complement | in seen? | action | seen after |
|---|---|---|---|---|---|
| 0 | 2 | 7 | no | insert (2, 0) | {2: 0} |
| 1 | 7 | 2 | yes (0) | return [0, 1] | — |
Variants worth knowing
- LC 167 Two Sum II: input is sorted. Don't reach for a hash map; use two pointers for O(1) extra space. Recognising "the input is already sorted, so hash map is overkill" is the senior signal.
- LC 170 Two Sum III: design data structure. Build a class that supports
add(x)andfind(target). Trade-off between optimising add vs find; hash map of counts is the standard answer. - LC 653 Two Sum IV: BST input. In-order traversal yields sorted output → two pointers. Or hash set, walking the tree.
- LC 1099 Two Sum Less Than K. Sort + two pointers. Hash map doesn't help because the comparison is inequality, not equality.
- LC 15 / 16 / 18: N-Sum family. Reduce N-Sum to Two Sum recursively. For N = 3 or 4, sort + two pointers beats hash map on space. For very large N with small values, meet-in-the-middle.
4 · Five-problem progression
| Step | Problem | Difficulty | What's new |
|---|---|---|---|
| 1 | LC 1 · Two Sum | Easy | The canonical complement-lookup. |
| 2 | LC 49 · Group Anagrams | Medium | Group-by with a canonical key (sorted string or letter-count tuple). |
| 3 | LC 128 · Longest Consecutive | Medium | Set-membership with the "start of a run" trick — O(n) total despite the appearance of nesting. |
| 4 | LC 560 · Subarray Sum Equals K | Medium | Prefix-sum + hash map. The key insight: if prefix[j] - prefix[i] == K, then prefix[i] == prefix[j] - K. |
| 5 | LC 146 · LRU Cache | Medium | Hash map + doubly-linked list. The canonical "two structures composed" interview question. |
5 · The four sub-patterns, with templates
Each sub-pattern below maps to one of the shapes in section 2. The templates here are the implementation skeleton; the shapes are the recognition signal. When you spot the shape, reach for the matching template.
Sub-pattern A: counting
Build a frequency map; ask questions of it. "Are these anagrams?" → equal frequency maps. "Top K most frequent?" → sort the map by value and take the top K, or push into a min-heap of size K, or bucket-sort.
// Frequency-map shape — Go skeleton
func frequencyMap(s string) map[byte]int {
cnt := make(map[byte]int)
for i := 0; i < len(s); i++ {
cnt[s[i]]++
}
return cnt
}
// Use it to:
// anagram check (LC 242) — build maps for s and t, compare
// top-K frequent (LC 347) — heap of size K, or bucket-sort O(n)
// first unique character (LC 387) — single pass after building cntSub-pattern B: complement lookup
For each x, ask whether the value that completes the
constraint is already in the map. The complement formula is the part
that varies.
// Complement-lookup shape — Go skeleton
//
// for i, x := range nums {
// key := f(x, target) // e.g. target - x for "find pair summing to target"
// if j, ok := seen[key]; ok {
// // pair found: indices j and i
// }
// seen[x] = i
// }
//
// Variants of f(x, target):
// "pair summing to target" → target - x
// "pair with difference K" → x - K (and x + K)
// "first repeating element" → x itself
// "longest substring with at most…" → state derived from x and current stateSub-pattern C: group by canonical key
Derive a "normalised form" of the input: anything two equivalent inputs share. Use it as a map key; bucket the originals.
// Group-by-canonical-key shape — Go skeleton
//
// groups := make(map[KEY][]ITEM)
// for _, x := range items {
// key := canonicalForm(x) // <-- this is the whole problem
// groups[key] = append(groups[key], x)
// }
//
// Choosing the key IS the design problem:
// LC 49 group anagrams -> sorted-letters string OR length-26 [26]int
// LC 290 word pattern -> bijective letter-to-word mapping signature
// LC 149 max points on line -> slope from origin (gcd-rationalised)
// LC 1153 string transforms -> letter-pair mapping table
//
// In Go, map keys must be comparable. Arrays like [26]int work; slices do NOT.
// For variable-size keys, hash the string form yourself.
Sub-pattern D: prefix sum + map
Maintain a running prefix sum; ask the map "how many earlier prefix sums
equal (current − k)?". That's the count of subarrays ending here whose
sum is exactly k.
// Prefix-sum + count-map shape — Go skeleton
//
// cnt := map[int]int{0: 1} // SEED: empty prefix has sum 0, seen once
// prefix := 0
// answer := 0
// for _, x := range nums {
// prefix += x
// // # of earlier prefixes equal to (prefix - k)
// // = # of subarrays ending HERE summing to k
// answer += cnt[prefix-k]
// cnt[prefix]++
// }
//
// The seed cnt[0]=1 is what makes "subarray starting at index 0" countable.
// Forgetting it is THE canonical bug for this pattern.
# Time: O(n) · Space: O(n)6 · Walkthrough: Longest Consecutive Sequence (LC 128)
LC 128 is the canonical "looks O(n²) but is actually O(n)" hash-set problem. Given an unsorted array, find the longest run of consecutive integers, without sorting (which would be O(n log n) and fail the problem's stated complexity).
Input: [100, 4, 200, 1, 3, 2]
Output: 4
Explanation: The longest consecutive run is 1, 2, 3, 4.The "is this the start of a run?" trick
The naive idea: for each x in the set, count x, x+1,
x+2, ... until a number is missing. Worst case O(n²). The fix:
only kick off the inner loop if x - 1 is not in the
set, meaning x is the start of a run. Each element is then
visited at most twice across the whole algorithm: once as a start
candidate, once as part of an inner-loop walk.
Build the intuition
- Why a set, not the original array? Set-membership in O(1) is the lever. The original might have duplicates; the set won't.
- What's the wrong O(n²) version? For each
x, walkx, x+1, x+2, ...until missing. Worst case (sorted contiguous run): every walk is O(n). Total O(n²). - The fix: what condition makes
xworth walking from? Ifx − 1is not in the set, thenxis the start of a fresh run. Skip otherwise. - Why is this O(n)? Each element is visited at most twice: once as a candidate start (skipped if not), once as part of a walk (only if its predecessor isn't in the set, which happens at most once per run).
Go scaffold
func longestConsecutive(nums []int) int {
set := make(map[int]bool)
for _, x := range nums {
set[x] = true
}
best := 0
for x := range set {
// GUARD: only start a walk if x is the run's start
// TODO: skip if x-1 is in set
// WALK: count consecutive integers from x upward
cur := x
length := 1
for /* TODO: condition */ {
// TODO: advance cur and length
}
if length > best {
best = length
}
}
return best
}Trace through [100, 4, 200, 1, 3, 2]
| x | x − 1 in set? | Run length | best |
|---|---|---|---|
| 100 | no → start | 100 → 1 | 1 |
| 4 | yes (3) → skip | — | 1 |
| 200 | no → start | 200 → 1 | 1 |
| 1 | no → start | 1, 2, 3, 4 → 4 | 4 |
| 3 | yes (2) → skip | — | 4 |
| 2 | yes (1) → skip | — | 4 |
x is a start. Across all starts, the total inner-loop
work equals the sum of run lengths, which is at most n (each element is
in exactly one run). So total work is O(n) outer + O(n) inner = O(n).7 · Walkthrough: Subarray Sum Equals K (LC 560)
Prefix-sum + hash map is the technique for "count subarrays with sum equal to K", and many of its variants. The key identity:
prefix[j] − prefix[i] = K, then the subarray
nums[i+1..j] sums to K. Equivalently,
prefix[i] = prefix[j] − K. So at each j, the
number of valid subarrays ending at j equals the count of
earlier prefix-sum values equal to prefix[j] − K.Build the intuition: work through the identity
- The identity to internalise. If
prefix[j] − prefix[i] = K, thennums[i+1..j]sums toK. Equivalently:prefix[i] = prefix[j] − K. - What does the map count? For each prefix sum value, how many indices have produced it so far. Why count, not just "have we seen it"?
- Why the seed
cnt[0] = 1? The empty prefix has sum 0. Without the seed, subarrays starting at index 0 would be missed. Test onnums = [3],k = 3with and without the seed. - Why does sliding window fail here? When negatives are present, the prefix sum isn't monotone, so sliding window's "left edge only moves forward" invariant breaks.
Go scaffold
func subarraySum(nums []int, k int) int {
cnt := map[int]int{0: 1} // SEED: empty prefix has sum 0
prefix := 0
answer := 0
for _, x := range nums {
prefix += x
// # of subarrays ending HERE that sum to k
// = # of earlier indices where prefix was (prefix - k)
// TODO: add cnt[prefix - k] to answer
// RECORD this prefix
// TODO: increment cnt[prefix]
}
return answer
}Trace through nums = [1, 2, 3, -2, 1], k = 3
| i | x | prefix | prefix − k | cnt[prefix − k] | answer | cnt after |
|---|---|---|---|---|---|---|
| 0 | 1 | 1 | −2 | 0 | 0 | {0:1, 1:1} |
| 1 | 2 | 3 | 0 | 1 | 1 | {0:1, 1:1, 3:1} |
| 2 | 3 | 6 | 3 | 1 | 2 | {0:1, 1:1, 3:1, 6:1} |
| 3 | −2 | 4 | 1 | 1 | 3 | {0:1, 1:1, 3:1, 6:1, 4:1} |
| 4 | 1 | 5 | 2 | 0 | 3 | {0:1, 1:1, 3:1, 6:1, 4:1, 5:1} |
Three subarrays sum to 3: [1, 2] at indices 0..1,
[3] at index 2, and [2, 3, −2] at indices
1..3. The algorithm counted all three correctly. The map's role is
pure bookkeeping: it never needs to know which indices
produced each prefix sum, only how many.
The general shape of "subarray with X property"
LC 560 is the cleanest instance of a meta-pattern. Anything of the form "count subarrays where some-aggregate equals k" reduces to rolling pre-state if the aggregate is invertible. Substitute:
- Sum equals k → prefix sum; look up
prefix − k. - Sum divisible by k → prefix sum mod k; look up
prefix mod k(LC 974). - XOR equals k → prefix XOR; look up
prefix ⊕ k(LC 1310 / 1442). - Count of 1s equals count of 0s → prefix(# 1s − # 0s); look up the same value (LC 525).
- Product equals k → trickier: multiplication isn't invertible at zero. Either avoid zero or use logs (with floating-point caveats).
agg; maintain a map from "value of agg" to "count of
indices that produced it". At each step, look up the value that
would close a valid subarray ending here. Increment the count of
agg after the lookup. The seed entry corresponds to
"empty prefix"; forget it and you miss subarrays starting at index 0.cnt = {0: 1}.
Without it, subarrays starting at index 0 (like [1, 2] in
the trace) would be missed. The seed says "the empty prefix has been
seen once, with sum 0". When prefix == k at some
j, the lookup of prefix − k = 0 hits the seed
and counts the subarray from index 0 to j.When sliding window fails
Sliding window can solve "longest subarray with sum K" only when all values are non-negative (so the prefix sum is monotonically non-decreasing). With negatives in the mix, the prefix-sum-plus-hash-map technique is the only O(n) path. Recognise this distinction at the "recognise it" stage of the seven-thing checklist.
Visualising the rolling pre-state
The rolling pre-state shape is the most subtle hash-map idiom: the map's role is not to count items, but to count prior states such that current − prior matches the target. Internalise this and you unlock dozens of "count subarrays with property X" variants.
8 · Walkthrough: Group Anagrams (LC 49)
LC 49 demonstrates the "group by canonical key" sub-pattern in its cleanest form. Given a list of strings, group them so that anagrams (same letters, possibly different order) end up together.
Build the intuition: design the key
- What property do anagrams share? The same multiset of letters. Two strings are anagrams iff their multisets are equal.
- Two natural canonical forms. (a) Sort the letters → string key. (b) Count letters → fixed-size array key. Which is asymptotically faster, and by how much?
- In Go specifically: why
[26]intnot[]int? Map keys must be comparable. Arrays are; slices are not. - What if the input alphabet isn't a–z? Unicode? Use the sorted-string approach, or a wider fixed-size array indexed by code-point.
Go scaffold: two flavours
// Flavour 1: sorted-string key — O(n × k log k)
func groupAnagramsSorted(strs []string) [][]string {
groups := make(map[string][]string)
for _, s := range strs {
// TODO: sort the bytes of s, use the sorted form as the key
// TODO: append s to groups[key]
}
out := make([][]string, 0, len(groups))
for _, g := range groups {
out = append(out, g)
}
return out
}
// Flavour 2: count-array key — O(n × k); arrays are comparable in Go
func groupAnagramsCount(strs []string) [][]string {
groups := make(map[[26]int][]string)
for _, s := range strs {
var key [26]int
// TODO: fill key by counting each letter of s
groups[key] = append(groups[key], s)
}
// ... return as before
return nil // TODO
}The canonical-key trick generalises. Anywhere you can ask "what's the normalised form of this thing such that two equivalent inputs produce the same form?", you can group with a hash map. Examples: numbers grouped by their digit-sum; strings grouped by their pattern (LC 290); points grouped by their slope from a fixed origin (LC 149).
9 · Hard walkthrough: Minimum Window Substring (LC 76)
LC 76 is the canonical "hash map plus sliding window" hard problem.
It demonstrates the most subtle hash-map shape: using a counting
map as the condition for a window's validity, with a
second integer (have) caching the satisfaction state to
avoid re-checking the whole map on every step.
Problem. Given strings s and
t, return the shortest substring of s that
contains every character of t (counting multiplicities).
Empty string if no such window exists.
Input: s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"
Explanation: every other window containing A, B, and C is longer.Why brute force is too slow
Brute force: enumerate every (left, right) pair, check if the
substring contains all characters of t. O(n² · |t|).
With n, |t| ≤ 10⁵, that's 10¹⁵ operations, not even
close. The sliding-window-plus-hashmap solution is O(n + |t|).
Build the intuition: the four moving parts
- The need map.
need[c]= how many copies ofcthe window still owest. Built once fromt. - The have map.
have[c]= how many copies ofcthe current window contains, capped atneed[c](extras don't help). - The matched counter. A single integer counting how many of
t's distinct chars are fully satisfied (i.e.have[c] == need[c]). Whenmatched == len(need), the window is valid. - The window pointers. Expand
rightuntil valid; contractleftwhile staying valid, recording the shortest. Each pointer moves O(n) times total, linear.
The matched-counter trick: why it's the right design
A naive implementation would re-compare have against
need on every window change: O(|t|) per step, O(n |t|)
total. The matched counter caches the answer to "are
all keys satisfied?". Incrementing or decrementing it only when a
single character flips from unsatisfied to satisfied (or vice versa)
makes each window step O(1).
Go scaffold: fill in the four branches
func minWindow(s, t string) string {
if len(s) < len(t) {
return ""
}
need := make(map[byte]int)
for i := 0; i < len(t); i++ {
need[t[i]]++
}
required := len(need) // # of distinct chars to satisfy
have := make(map[byte]int)
matched := 0 // # of distinct chars currently satisfied
bestLen := 1 << 30
bestL := 0
left := 0
for right := 0; right < len(s); right++ {
c := s[right]
// EXPAND — incorporate s[right] into the window
if _, ok := need[c]; ok {
have[c]++
// TODO: if have[c] == need[c], increment matched
}
// CONTRACT — while window is valid, try to shrink from the left
for matched == required {
// record best
if right-left+1 < bestLen {
bestLen = right - left + 1
bestL = left
}
d := s[left]
// TODO: if need[d] exists, decrement have[d]
// TODO: if have[d] drops below need[d], decrement matched
left++
}
}
if bestLen == 1<<30 {
return ""
}
return s[bestL : bestL+bestLen]
}Trace through s = "ADOBECODEBANC", t = "ABC"
| right | s[right] | have after | matched | contract? | best window |
|---|---|---|---|---|---|
| 0 | A | A:1 | 1 | no | — |
| 3 | B | A:1, B:1 | 2 | no | — |
| 5 | C | A:1, B:1, C:1 | 3 ✓ | yes → "ADOBEC" (6) | "ADOBEC" |
| 10 | B | (after contract) A:1, B:2, C:0 | matched dropped, then re-rises | — | (no improvement) |
| 12 | C | A:1, B:1, C:1 (after contracts) | 3 ✓ | yes → "BANC" (4) | "BANC" |
matched is have[c] == need[c], with
equality. If have[c] overshoots
need[c] (e.g. extra A's in the window), matched
must not double-count. Symmetrically when contracting: only decrement
matched when have[d] drops below need[d], not when it merely changes. Get one of these
four conditions wrong and the algorithm returns garbage.9b · Bonus: Continuous Subarray Sum (LC 523)
A close cousin of LC 560 showing the prefix-mod variant of the rolling pre-state shape. Worth seeing because it teaches a small but important twist: when two indices share a prefix-sum modulo k, the subarray between them sums to a multiple of k.
Problem. Given an integer array nums and
an integer k, return true if nums has a
continuous subarray of size at least 2 whose elements sum to a
multiple of k.
The identity
If prefix[j] mod k == prefix[i] mod k with
j > i, then (prefix[j] − prefix[i]) mod k == 0,
so nums[i+1..j] sums to a multiple of k. Map each
modular residue to its earliest index; when you see the
same residue again at index j, check that
j − earliest ≥ 2.
Go scaffold
func checkSubarraySum(nums []int, k int) bool {
// residue -> earliest index where this residue appeared
first := map[int]int{0: -1} // SEED: empty prefix has residue 0 at index -1
prefix := 0
for i, x := range nums {
prefix += x
r := prefix % k
if r < 0 { // Go's % on negatives is signed; normalise
r += k
}
if earliest, ok := first[r]; ok {
if i-earliest >= 2 {
return true
}
} else {
first[r] = i // record only on first sighting
}
}
return false
}first[0] = -1, a
subarray starting at index 0 with sum divisible by k would be missed:
there'd be no earlier "empty prefix" residue to match against.
Same trick as LC 560's cnt[0] = 1, different shape.10 · How to solve hard hash-map problems step-by-step
The recipe that collapses 90% of "feels like hash map but I'm stuck" into a clean solve. Run through the four questions in order; if any answer is unclear, slow down; the bug is in the design, not the code.
- What is the key? Possibilities: the raw value, a normalised key (sorted-letters string), a signature (count tuple), a derived pair (slope + intercept), a state (running prefix sum). The hardest hash-map problems are hard because the right key is non-obvious. LC 49 = letter-count tuple; LC 149 = slope as gcd-rationalised fraction; LC 560 = running prefix sum.
- What is the value? Possibilities: a count, an index, a list of items (group-by), an accumulator (running min / max), a node pointer (LRU). For complement-lookup, value = index. For counting, value = count. For group-by, value = list. For first-occurrence, value = index. Wrong choice here breaks the next step.
- When do you insert? When do you update? The "check-then-insert" mantra of Two Sum is the canonical example: flip the order and the algorithm breaks on duplicate inputs. For prefix-sum problems, always increment after looking up (otherwise a sum of 0 counts itself). For LRU, every operation both updates the map and reorders the list.
- One-pass or two-pass? One-pass is the goal, but some problems need a precomputation (LC 1 with hash) or two phases (Hopcroft-Karp). When in doubt, code the two-pass version first; collapse to one-pass after the algorithm works.
11 · Sister data structures
Hash map is the default. But several variants, and several entirely different data structures, fill specific gaps. Knowing when to reach past the hash map is a senior signal.
| Structure | Lookup | What it adds over hash map | When to reach for it |
|---|---|---|---|
| Sorted map (TreeMap / std::map / BTreeMap) | O(log n) | Ordered iteration; "smallest key ≥ X" in O(log n) | Time-based KV (LC 981), range queries, finding the next event after time T. |
| LinkedHashMap (Java) / OrderedDict (Python) | O(1) | Insertion-order iteration | LRU implementation in 5 lines (Python's OrderedDict.move_to_end is the LRU primitive); deterministic JSON output. |
| BiMap (Guava) / dual maps | O(1) both ways | Look up by key or by value | Bijection problems (LC 290 Word Pattern, LC 205 Isomorphic Strings, LC 890). Cheaper than maintaining two maps and keeping them in sync. |
Multimap (Guava) / map[K][]V | O(1) | Multiple values per key, with bulk operations | Group-by patterns where the bucket itself needs operations (sort, sum, min-heap). |
| Bloom filter | O(k) probes | Memory ≪ hash set; tolerates false positives | "Have I probably seen this URL?" at billion-element scale. Pre-filter before an expensive lookup. Used in databases (Cassandra), browsers (Chrome's Safe Browsing). |
| Count-Min Sketch | O(d) | Approximate counts in sub-linear memory | "Top-K frequent items" in a stream where you can't fit the full frequency table. Used in network monitoring, ad-tech, search analytics. |
| HyperLogLog | O(1) | Cardinality estimation in ~1.5 KB | "How many distinct users today?" at petabyte scale. Redis's PFCOUNT. |
| Consistent hash ring | O(log n) | Stable assignment when nodes are added/removed | Distributed caches, sharded databases. Adds 1/n keys on resize instead of all of them. |
| Trie | O(|key|) | Prefix iteration; shared prefixes share memory | Word search (LC 212), autocomplete, longest-common-prefix queries. Trie pattern covers the details. |
When the hash map is the wrong tool
- You need range queries. "Smallest key ≥ X" / "all keys in [a, b]": hash maps can't do these; sorted map is the move. Time-based key-value store (LC 981) is the canonical example.
- You need O(1) extra space. Constraint kills hash map. Reach for two pointers, in-place mutation, or bit-manipulation. LC 136 Single Number is the textbook: hash-map solves it in O(n) time + O(n) space; XOR over the array does it in O(n) + O(1).
- You need worst-case O(1). Hash map is average-case O(1); worst-case O(n). For hard real-time systems, reach for cuckoo hashing (guaranteed O(1) lookup) or a sorted-tree map (guaranteed O(log n)).
- You need persistent / functional access. Hash array mapped tries (HAMT), such as Clojure's
PersistentHashMapand Scala's immutable map, keep previous versions of the map alive with structural sharing. Each "update" returns a new map in O(log n) without copying.
12 · LRU cache: hash map + doubly-linked list (LC 146)
LC 146 is the canonical "two structures composed" hash-map question. Implement a fixed-capacity cache with O(1) get and put. The structures are a hash map (key → node) and a doubly-linked list (most recently used at the front, least recently used at the tail).
Build the intuition: answer six design questions
- Why two structures, not one? Hash map: O(1) lookup by key. Linked list: O(1) recency reordering. Neither alone gets both.
- Which end of the list is "most recent"? Pick a convention and stick to it. Most implementations: head = most-recent, tail = least-recent.
- What does
Get(key)do besides return the value? Move the accessed node to the most-recent side. Why? "Most recently used" should reflect access, not just insertion. - What does
Put(key, val)do on a key that's already there? Overwrite the value, AND move to most-recent. (Don't double-insert.) - What does
Put(key, val)do when at capacity? Evict the LRU (tail's predecessor), remove from the map, then insert. - The sentinel trick: what does it buy you? Two dummy nodes (head, tail). Every real node has non-nil prev and next. No "is this the first / last node?" branches in remove or insert.
Go scaffold: fill in the four list ops
type lruNode struct {
key, val int
prev, next *lruNode
}
type LRUCache struct {
cap int
m map[int]*lruNode
head, tail *lruNode // sentinels: head = most-recent side, tail = LRU side
}
func NewLRUCache(capacity int) *LRUCache {
head := &lruNode{}
tail := &lruNode{}
head.next = tail
tail.prev = head
return &LRUCache{
cap: capacity,
m: make(map[int]*lruNode),
head: head,
tail: tail,
}
}
// Detach a node from its current position
func (c *LRUCache) remove(n *lruNode) {
// TODO: splice n out of the doubly-linked list
}
// Insert a node right after head (most-recent position)
func (c *LRUCache) addFront(n *lruNode) {
// TODO: insert between head and head.next
}
func (c *LRUCache) Get(key int) int {
// TODO: if not present, return -1
// TODO: move node to front, return its value
return -1
}
func (c *LRUCache) Put(key, val int) {
// TODO: if key exists, remove old node from list
// TODO: else if at capacity, evict tail.prev (LRU) from list and map
// TODO: create new node, addFront it, record in map
}remove and addFront on
paper. Draw four nodes (sentinel head, A, B, sentinel tail).
Walk through Get(B): remove(B) then
addFront(B). Confirm the pointers end up correct. This is
where most implementations fail: the splice operations are short but
subtle.The two sentinel pattern
The sentinel head and tail nodes simplify edge cases. Without them, every insert/remove would need a "is this the first / last node?" check. With them, every real node has a real prev and next: no nulls to handle. The trick generalises to any doubly-linked list operation in interview-shaped problems.
12b · Hashing internals: what you should know
A senior interviewer occasionally asks "what's actually happening inside a hash map?". Two minutes of grounding to give a confident answer:
- The hash function turns a key into an integer (typically 32 or 64 bits). For string keys, that's something like FNV-1a or MurmurHash. For integer keys, often the integer itself, possibly XOR'd with a constant.
- The bucket index is the hash modulo the table size. Most implementations use a table size that's a power of 2, so modulo is a bitmask.
- Collisions are unavoidable (pigeonhole). Two strategies: chaining (each bucket is a linked list of (key, value) pairs) or open addressing (probe forward to find an empty slot — linear probing, quadratic, or double hashing).
- Load factor = entries ÷ buckets. When it exceeds a threshold (~0.75 in Java, ~0.6 in Go's map, varies elsewhere), the table is rehashed: a new table 2× larger is allocated, every entry is re-inserted. Rehash is O(n) but amortised O(1) per insert.
- Worst case is O(n) when an adversary picks keys that all collide. Pre-2012 PHP, Python, and others used predictable hashes vulnerable to this denial-of-service. Modern languages use a per-process random seed (HashDoS mitigation).
- Memory layout matters. Open-addressing tables (Robin Hood, Swiss Tables) are typically 2–3× faster than chaining because they're cache-friendly. Go's map and Rust's
HashMapuse chained-then-open-addressed hybrid designs.
For interview answers: state O(1) average for insert/lookup/delete, with a footnote about worst case and rehashing. Don't volunteer implementation depth unless asked; it derails the conversation. But when asked, a 30-second confident answer about chaining vs open addressing is a strong signal.
13 · Common pitfalls: what breaks
- Inserting before checking. The Two Sum bug. Always check, then insert.
[3, 3]with target 6 matches the same index against itself if you flip the order. - Forgetting the seed in prefix-sum maps.
cnt = {0: 1}is the standard initialiser: without it, subarrays starting at index 0 are missed. Test onnums = [k]for any k; without the seed it returns 0. - Updating
matchedon the wrong comparison. The LC 76 trap. Increment whenhave[c]hitsneed[c]exactly; decrement when it drops below. Off-by-one on either side returns nonsense. - Hashing mutable types. In Python, lists aren't hashable; use tuples. In Java, mutating a key after insertion corrupts the map — the key still hashes to the old bucket, but
equalscompares against the new value, so lookups silently fail. Go forbids slices as keys for the same reason; arrays (fixed size) are allowed because they're value types. - Adversarial hash collisions. Pre-2012, predictable hashes made HashDoS attacks practical: a malicious client sends keys all hashing to one bucket, degrading every operation to O(n). Modern languages randomise the seed per process. Competitive programmers sometimes exploit this in C++
unordered_mapon Codeforces, where the seed is fixed; usemap(red-black tree) or a custom hash if you suspect. - Go map iteration order non-determinism. Go's runtime deliberately randomises iteration order on every
rangeover a map. Don't write tests that depend on a specific order; don't write algorithms that depend on it either. If you need order, use a slice of keys you sort or a sorted map. - Iterating while mutating. Modifying a hash map during iteration over it usually throws (Python:
RuntimeError; Java:ConcurrentModificationException; Go: undefined behaviour — sometimes works, sometimes panics, sometimes silently produces wrong results). Iterate over a snapshot of the keys, or stage changes in a list and apply after the loop. - Floating-point keys. Don't. Equal-by-value floats can hash differently due to NaN, signed zero, or representation drift through arithmetic. Round to integer or use a string representation with controlled precision.
- Integer overflow in custom hash functions. Rolling hashes (Rabin-Karp), polynomial hashes for tuples, or any custom hash that multiplies and adds, will overflow 64-bit on long inputs. Use
uint64arithmetic and accept that hash collisions become possible — verify with the actual key on collision. - Off-by-one in counts. "How many anagrams" vs "how many groups": read the problem twice. The frequency-map shape is so common that off-by-one in the answer extraction is the most common bug.
- Mutating a struct that's a map key. In Go,
map[StructType]Vis allowed for comparable structs, but if you take a value out, mutate it, and try to put it back at the same key, the key has changed. Use the original key (immutable copy) to update. - Counting beyond what's needed in sliding-window. In LC 76, increment
have[c]unconditionally; only count it towardmatchedat the equality boundary. Cappinghave[c]atneed[c]works too but obscures the algorithm.
14 · Complexity
| Operation | Average | Worst case | Notes |
|---|---|---|---|
| Insert | O(1) | O(n) | Worst case is rare; requires adversarial keys. |
| Lookup | O(1) | O(n) | Same. |
| Delete | O(1) | O(n) | Same. |
| Iterate | O(n) | O(n) | No order guarantee in most languages. |
In interview answers, state O(1) for hash-map ops with a footnote about worst case. Anything else over-engineers the answer.
15 · Variants & related patterns
- Sorted-map (TreeMap, std::map). When you need ordered keys — "smallest key > X" in O(log n). Different structure, same family.
- Sliding window. Most variable windows track membership via a hash map.
- Two pointers. The O(1)-space alternative when you can sort first.
- Trie. Specialised structure for prefix-keyed lookups; replaces hash map for word-search-style problems.
- Bloom filter. When you need set-membership at scale and can tolerate false positives. Out of scope for LC, common in real systems.
- Adjacent: consistent-hashing simulator. Distribution-of-keys-across-nodes — system-design-shaped, but builds the cost-model intuition.
16 · Where this gets asked
Hash-map is the most-asked pattern in interview history. If you only memorise one pattern, this is the one. Six companies and what they tend to reach for:
| Company | Common framing | Why it fits their stack |
|---|---|---|
| Two Sum (LC 1), Subarray Sum Equals K (LC 560), Longest Consecutive Sequence (LC 128) | Indexing and ranking lookups — every hash is an inverted-index entry under the hood. | |
| Meta | Group Anagrams (LC 49), Valid Anagram (LC 242), Top K Frequent Elements (LC 347) | Feed-ranking dedup, content-signal hashing. |
| Amazon | Two Sum (LC 1) — quite literally the most-asked Amazon question, ever, First Unique Character (LC 387) | The phone-screen warm-up of choice for L4 hires. |
| Microsoft | LRU Cache (LC 146), Insert/Delete/GetRandom O(1) (LC 380) | Office's serialisation layer + Azure caching layer use these patterns directly. |
| Apple | Contains Duplicate (LC 217), Roman to Integer (LC 13), Isomorphic Strings (LC 205) | Lookup-table problems are L4 staple at Apple Services. |
| Stripe / Cloudflare / Datadog | Rate-limiter (sliding-window counter via hash), idempotency-key dedup, request fingerprinting | Every payment / edge / monitoring API has a per-key hash . |
17 · Try it yourself
Five problems, ordered by depth. Spend 30 minutes on each before checking the hint.
- Two Sum · LC 1
- The canonical hash-map warm-up. For each number, check if (target − num) is in the map; if not, add this num. Hint: walk the array once, building the map as you go — don't pre-populate it, you'd hit duplicate-index bugs on inputs like
[3,3]. - Group Anagrams · LC 49
- Key by sorted characters; values are the buckets. Hint: in Python,
defaultdict(list)avoids the boilerplate. The sorted-string key is O(k log k); a 26-element count tuple key is O(k) but uglier. - Subarray Sum Equals K · LC 560
- Prefix sum + hashmap. Track the running sum; the count of subarrays ending here is the number of previous prefix sums equal to (running − k). Hint: seed the map with
{0: 1}so subarrays starting at index 0 count correctly. - Longest Consecutive Sequence · LC 128
- Set of all numbers, then start a streak only from numbers whose predecessor isn't present. Hint: O(n) is achievable because each number is visited at most twice — once as a "start" candidate, once as a member of a streak. Don't sort.
- Stretch — LRU Cache · LC 146
- Hashmap + doubly-linked list = O(1) get + O(1) put with LRU eviction. Hint: use sentinel head/tail nodes; this avoids the if/null checks at boundary cases and makes the code 30% shorter.
Further reading
- CLRS — Introduction to Algorithms, Chapter 11. Hash tables, collision resolution, universal hashing. The reference.
- Sedgewick & Wayne — Algorithms (4th ed). Chapter 3.4 covers hash tables with separate chaining and linear probing.
- Knuth — TAOCP Vol 3. The original analysis of hashing. Mathematical, dense, foundational.
- Robin Hood hashing. A modern open-addressing variant with much better worst-case probe distance. Used in Rust's
HashMaphistorically and in many competitive HFT codebases. - Adjacent: how hash tables work. The mechanism explainer on this site, with diagrams.
- Adjacent: two pointers. The other Tier A pattern; sometimes the alternative.