03 / 20 · Tier A
Patterns / 03

Hash map / set

The pattern that turns a brute-force O(n²) into O(n) by paying constant extra memory per element. Hash maps and sets unlock four sub-patterns: counting (frequency tables), complement-lookup (Two Sum), dedupe (sets), and prefix-sum + map (subarray-sum-equals-K). One of these is in roughly one in three interview problems.


Try it: Two Sum, step by step

Drag through the canonical Two Sum execution. Watch the seen-map fill up as the pointer walks; the green flash is the moment the complement is found.

Interactive · Two Sum (LC 1) complement-lookup with a hash map; check before insert
array
0
3
1
8
2
2
3
7
4
11
5
4
6
9
seen map (value → index)
empty
start: seen-map is empty
step 1 / 4
Why "check, then insert". Inserting first would let an array like [3, 3] with target 6 match a single element with itself. Checking first means at iteration i the seen-map only contains indices strictly less than i — no self-match is possible.

1 · Why hash maps matter

The hash map is the single most common reason a brute-force O(n²) collapses to O(n). The mechanism. Constant-time lookup by key. Average case. Done by hashing the key to a bucket index in a fixed-size table. The amortised analysis makes inserts O(1) even though occasional rehashes are O(n). The doubling-table pattern pays for itself geometrically.

  • The space-time trade. O(n) extra memory buys O(n²) → O(n) time. Almost always worth it; only refuse the trade when the problem explicitly forbids extra space.
  • The amortised reality. An individual put on a hash map can be O(n): when the load factor crosses the threshold, the table doubles and every entry is rehashed. Over n inserts, the total work is still O(n), so each insert is amortised O(1).
  • When hashing breaks. Adversarial inputs can force every key into the same bucket, degrading lookups to O(n). Modern languages (Python ≥ 3.4, Java ≥ 8, Go) randomise the hash seed per process to make this hard. On unhardened implementations (or with predictable hash functions) the worst case is real and exploitable.
  • The two ways the abstraction leaks. Iteration order (Go: deliberately randomised; Java HashMap: insertion order broken; Python ≥ 3.7: insertion order preserved by language guarantee). Mutating a key after insertion silently corrupts the map in most languages.
The recognition trick. Read the problem statement and ask "what would I need to look up in O(1) for this to be linear?". If there's a clean answer ("I'd look up the complement", "I'd check if I've seen this prefix sum before"), that's the hash-map move. If the answer is "I'd look up the smallest key greater than X", you want a sorted map (TreeMap / BTreeMap), not a hash map.

2 · How to recognise it: five distinct shapes

Hash map problems take a small number of recurring shapes. Naming them helps the trigger fire faster in interviews.

Shape A: count-and-check

Build a frequency map; ask boolean / count questions of it. "Are these anagrams" (compare two maps). "First unique character" (find the first key with count 1). "Top K frequent" (sort or heap on the map's values). The defining shape of counting.

Shape B: complement lookup

Walking the input, for each x, ask whether a specific complement is already in the map. Two Sum (target − x). "Pair with difference K" (x − K and x + K). Contains Duplicate (x itself). The defining shape of one-pass pair search.

Shape C: group by signature key

Derive a canonical form for each input element; use it as a map key to bucket equivalent elements. Group Anagrams (sorted-string or letter-count signature). Word Pattern (bijective mapping signature). Points on a Line (slope from origin). The defining shape of equivalence classes.

Shape D: first-occurrence index

Store value → index. Used when you need the position, not just the fact. Longest substring without repeat (record where each char was last seen; jump the left pointer past it). Two Sum (return the indices, not the values). Detect "second visit" patterns where the distance between visits matters.

Shape E: rolling pre-state

Maintain a running aggregate (prefix sum, prefix XOR, running hash); use the map to count earlier states matching some predicate against the current state. Subarray Sum Equals K (running prefix; look up prefix − k). Subarray sum divisible by K (running prefix mod K). Continuous Subarray Sum (LC 523). The defining shape of "a subarray ending here is good iff some earlier prefix has property X".

The four-question filter. Before coding, write down (1) what are the keys, (2) what are the values, (3) when do I insert, (4) when do I look up. If you can't answer all four in one sentence, you don't have the shape yet. Slow down. Most hash-map bugs come from skipping this step.
SHAPE B — COMPLEMENT LOOKUP (Two Sum, target = 9)271115i = 2, x = 11lookup (9 − 11) = −2not in map → record (11, 2)seen map2 → 07 → 111 → 2 (just added)For each x: (1) look up complement; (2) if found, return the pair; (3) else insert (x, index).

3 · Canonical example: Two Sum (LC 1)

Problem. Given an unsorted array and a target, return the indices of two numbers that sum to the target. Each input has exactly one solution; you can't use the same element twice.

Input:  nums = [2, 7, 11, 15], target = 9
Output: [0, 1]
Explanation: nums[0] + nums[1] == 9.

Why hash map

  • Brute force: O(n²) over all pairs.
  • Sort + two pointers: O(n log n), but the indices change after sorting, so you'd need to track originals. Doable but awkward.
  • Hash map: O(n) time, O(n) space. The single-pass complement-lookup.

Why Two Sum is the canonical hash-map question

Two Sum is the first hash-map question every candidate sees because it isolates the pattern in its purest form: one map, one pass, one complement lookup. Every other hash-map shape is an elaboration on this skeleton: group-by replaces "lookup of one value" with "bucket into many"; prefix-sum replaces "x" with "running aggregate"; canonical-key replaces "x" with "signature(x)". Get this one in your fingers and the rest fall out.

Build the intuition: three questions

  • What does seen store? Value → index, for every element we've already scanned past. Why both, why not just the values?
  • For each new element x, what do you look up? The complement: the value that, paired with x, hits the target. What's its formula?
  • Check, then insert: why that order? What does [3, 3] with target 6 do if you flip it?

Go scaffold

func twoSum(nums []int, target int) []int {
    seen := make(map[int]int) // value -> index

    for i, x := range nums {
        // TODO: compute the complement
        // TODO: check if complement is in seen — if yes, return the pair
        // TODO: otherwise, record the current value/index
    }

    return nil // problem guarantees a solution; this is unreachable
}
The "check, then insert" mantra. At iteration i, the seen-map only contains entries strictly before i. So the complement (if found) is at a different index: no risk of pairing an element with itself. Flipping the order (insert first, check second) breaks this on inputs with the same value twice.
The order matters. Check before inserting. If you insert first, an array like [3, 3] with target 6 would match the same element with itself. Check, then insert.

Reference solution: the complete one-pass

func twoSum(nums []int, target int) []int {
    seen := make(map[int]int)            // value -> index

    for i, x := range nums {
        complement := target - x
        if j, ok := seen[complement]; ok {
            return []int{j, i}           // earlier index first, by convention
        }
        seen[x] = i
    }

    return nil                           // problem guarantees a solution
}

Trace through nums = [2, 7, 11, 15], target = 9

ixcomplementin seen?actionseen after
027noinsert (2, 0){2: 0}
172yes (0)return [0, 1]

Variants worth knowing

  • LC 167 Two Sum II: input is sorted. Don't reach for a hash map; use two pointers for O(1) extra space. Recognising "the input is already sorted, so hash map is overkill" is the senior signal.
  • LC 170 Two Sum III: design data structure. Build a class that supports add(x) and find(target). Trade-off between optimising add vs find; hash map of counts is the standard answer.
  • LC 653 Two Sum IV: BST input. In-order traversal yields sorted output → two pointers. Or hash set, walking the tree.
  • LC 1099 Two Sum Less Than K. Sort + two pointers. Hash map doesn't help because the comparison is inequality, not equality.
  • LC 15 / 16 / 18: N-Sum family. Reduce N-Sum to Two Sum recursively. For N = 3 or 4, sort + two pointers beats hash map on space. For very large N with small values, meet-in-the-middle.

4 · Five-problem progression

StepProblemDifficultyWhat's new
1LC 1 · Two SumEasyThe canonical complement-lookup.
2LC 49 · Group AnagramsMediumGroup-by with a canonical key (sorted string or letter-count tuple).
3LC 128 · Longest ConsecutiveMediumSet-membership with the "start of a run" trick — O(n) total despite the appearance of nesting.
4LC 560 · Subarray Sum Equals KMediumPrefix-sum + hash map. The key insight: if prefix[j] - prefix[i] == K, then prefix[i] == prefix[j] - K.
5LC 146 · LRU CacheMediumHash map + doubly-linked list. The canonical "two structures composed" interview question.

5 · The four sub-patterns, with templates

Each sub-pattern below maps to one of the shapes in section 2. The templates here are the implementation skeleton; the shapes are the recognition signal. When you spot the shape, reach for the matching template.

Sub-pattern A: counting

Build a frequency map; ask questions of it. "Are these anagrams?" → equal frequency maps. "Top K most frequent?" → sort the map by value and take the top K, or push into a min-heap of size K, or bucket-sort.

// Frequency-map shape — Go skeleton
func frequencyMap(s string) map[byte]int {
    cnt := make(map[byte]int)
    for i := 0; i < len(s); i++ {
        cnt[s[i]]++
    }
    return cnt
}

// Use it to:
//   anagram check (LC 242)         — build maps for s and t, compare
//   top-K frequent (LC 347)        — heap of size K, or bucket-sort O(n)
//   first unique character (LC 387) — single pass after building cnt

Sub-pattern B: complement lookup

For each x, ask whether the value that completes the constraint is already in the map. The complement formula is the part that varies.

// Complement-lookup shape — Go skeleton
//
// for i, x := range nums {
//     key := f(x, target)        // e.g. target - x for "find pair summing to target"
//     if j, ok := seen[key]; ok {
//         // pair found: indices j and i
//     }
//     seen[x] = i
// }
//
// Variants of f(x, target):
//   "pair summing to target"          → target - x
//   "pair with difference K"          → x - K (and x + K)
//   "first repeating element"         → x itself
//   "longest substring with at most…" → state derived from x and current state

Sub-pattern C: group by canonical key

Derive a "normalised form" of the input: anything two equivalent inputs share. Use it as a map key; bucket the originals.

// Group-by-canonical-key shape — Go skeleton
//
// groups := make(map[KEY][]ITEM)
// for _, x := range items {
//     key := canonicalForm(x)        // <-- this is the whole problem
//     groups[key] = append(groups[key], x)
// }
//
// Choosing the key IS the design problem:
//   LC 49  group anagrams       -> sorted-letters string OR length-26 [26]int
//   LC 290 word pattern         -> bijective letter-to-word mapping signature
//   LC 149 max points on line   -> slope from origin (gcd-rationalised)
//   LC 1153 string transforms   -> letter-pair mapping table
//
// In Go, map keys must be comparable. Arrays like [26]int work; slices do NOT.
// For variable-size keys, hash the string form yourself.

Sub-pattern D: prefix sum + map

Maintain a running prefix sum; ask the map "how many earlier prefix sums equal (current − k)?". That's the count of subarrays ending here whose sum is exactly k.

// Prefix-sum + count-map shape — Go skeleton
//
// cnt := map[int]int{0: 1}    // SEED: empty prefix has sum 0, seen once
// prefix := 0
// answer := 0
// for _, x := range nums {
//     prefix += x
//     // # of earlier prefixes equal to (prefix - k)
//     // = # of subarrays ending HERE summing to k
//     answer += cnt[prefix-k]
//     cnt[prefix]++
// }
//
// The seed cnt[0]=1 is what makes "subarray starting at index 0" countable.
// Forgetting it is THE canonical bug for this pattern.

# Time: O(n) · Space: O(n)

6 · Walkthrough: Longest Consecutive Sequence (LC 128)

LC 128 is the canonical "looks O(n²) but is actually O(n)" hash-set problem. Given an unsorted array, find the longest run of consecutive integers, without sorting (which would be O(n log n) and fail the problem's stated complexity).

Input:  [100, 4, 200, 1, 3, 2]
Output: 4
Explanation: The longest consecutive run is 1, 2, 3, 4.

The "is this the start of a run?" trick

The naive idea: for each x in the set, count x, x+1, x+2, ... until a number is missing. Worst case O(n²). The fix: only kick off the inner loop if x - 1 is not in the set, meaning x is the start of a run. Each element is then visited at most twice across the whole algorithm: once as a start candidate, once as part of an inner-loop walk.

Build the intuition

  • Why a set, not the original array? Set-membership in O(1) is the lever. The original might have duplicates; the set won't.
  • What's the wrong O(n²) version? For each x, walk x, x+1, x+2, ... until missing. Worst case (sorted contiguous run): every walk is O(n). Total O(n²).
  • The fix: what condition makes x worth walking from? If x − 1 is not in the set, then x is the start of a fresh run. Skip otherwise.
  • Why is this O(n)? Each element is visited at most twice: once as a candidate start (skipped if not), once as part of a walk (only if its predecessor isn't in the set, which happens at most once per run).

Go scaffold

func longestConsecutive(nums []int) int {
    set := make(map[int]bool)
    for _, x := range nums {
        set[x] = true
    }

    best := 0
    for x := range set {
        // GUARD: only start a walk if x is the run's start
        // TODO: skip if x-1 is in set

        // WALK: count consecutive integers from x upward
        cur := x
        length := 1
        for /* TODO: condition */ {
            // TODO: advance cur and length
        }

        if length > best {
            best = length
        }
    }

    return best
}

Trace through [100, 4, 200, 1, 3, 2]

xx − 1 in set?Run lengthbest
100no → start100 → 11
4yes (3) → skip1
200no → start200 → 11
1no → start1, 2, 3, 4 → 44
3yes (2) → skip4
2yes (1) → skip4
The amortised argument. The inner while-loop runs only when x is a start. Across all starts, the total inner-loop work equals the sum of run lengths, which is at most n (each element is in exactly one run). So total work is O(n) outer + O(n) inner = O(n).

7 · Walkthrough: Subarray Sum Equals K (LC 560)

Prefix-sum + hash map is the technique for "count subarrays with sum equal to K", and many of its variants. The key identity:

The identity. If prefix[j] − prefix[i] = K, then the subarray nums[i+1..j] sums to K. Equivalently, prefix[i] = prefix[j] − K. So at each j, the number of valid subarrays ending at j equals the count of earlier prefix-sum values equal to prefix[j] − K.

Build the intuition: work through the identity

  • The identity to internalise. If prefix[j] − prefix[i] = K, then nums[i+1..j] sums to K. Equivalently: prefix[i] = prefix[j] − K.
  • What does the map count? For each prefix sum value, how many indices have produced it so far. Why count, not just "have we seen it"?
  • Why the seed cnt[0] = 1? The empty prefix has sum 0. Without the seed, subarrays starting at index 0 would be missed. Test on nums = [3], k = 3 with and without the seed.
  • Why does sliding window fail here? When negatives are present, the prefix sum isn't monotone, so sliding window's "left edge only moves forward" invariant breaks.

Go scaffold

func subarraySum(nums []int, k int) int {
    cnt := map[int]int{0: 1}     // SEED: empty prefix has sum 0
    prefix := 0
    answer := 0

    for _, x := range nums {
        prefix += x

        // # of subarrays ending HERE that sum to k
        //   = # of earlier indices where prefix was (prefix - k)
        // TODO: add cnt[prefix - k] to answer

        // RECORD this prefix
        // TODO: increment cnt[prefix]
    }

    return answer
}

Trace through nums = [1, 2, 3, -2, 1], k = 3

ixprefixprefix − kcnt[prefix − k]answercnt after
011−200{0:1, 1:1}
123011{0:1, 1:1, 3:1}
236312{0:1, 1:1, 3:1, 6:1}
3−24113{0:1, 1:1, 3:1, 6:1, 4:1}
415203{0:1, 1:1, 3:1, 6:1, 4:1, 5:1}

Three subarrays sum to 3: [1, 2] at indices 0..1, [3] at index 2, and [2, 3, −2] at indices 1..3. The algorithm counted all three correctly. The map's role is pure bookkeeping: it never needs to know which indices produced each prefix sum, only how many.

The general shape of "subarray with X property"

LC 560 is the cleanest instance of a meta-pattern. Anything of the form "count subarrays where some-aggregate equals k" reduces to rolling pre-state if the aggregate is invertible. Substitute:

  • Sum equals k → prefix sum; look up prefix − k.
  • Sum divisible by k → prefix sum mod k; look up prefix mod k (LC 974).
  • XOR equals k → prefix XOR; look up prefix ⊕ k (LC 1310 / 1442).
  • Count of 1s equals count of 0s → prefix(# 1s − # 0s); look up the same value (LC 525).
  • Product equals k → trickier: multiplication isn't invertible at zero. Either avoid zero or use logs (with floating-point caveats).
The unifying skeleton. Maintain a running aggregate agg; maintain a map from "value of agg" to "count of indices that produced it". At each step, look up the value that would close a valid subarray ending here. Increment the count of agg after the lookup. The seed entry corresponds to "empty prefix"; forget it and you miss subarrays starting at index 0.
The seed entry cnt = {0: 1}. Without it, subarrays starting at index 0 (like [1, 2] in the trace) would be missed. The seed says "the empty prefix has been seen once, with sum 0". When prefix == k at some j, the lookup of prefix − k = 0 hits the seed and counts the subarray from index 0 to j.

When sliding window fails

Sliding window can solve "longest subarray with sum K" only when all values are non-negative (so the prefix sum is monotonically non-decreasing). With negatives in the mix, the prefix-sum-plus-hash-map technique is the only O(n) path. Recognise this distinction at the "recognise it" stage of the seven-thing checklist.

Visualising the rolling pre-state

SHAPE E — ROLLING PRE-STATE (Subarray Sum = K, k = 3)123-21p=1p=3p=6p=4p=5at p=6, look up cnt[6 − 3] = cnt[3] = 1cnt map at i=20 → 1 (seed)1 → 13 → 1 ← matches!(6 added after lookup)Each prefix sum lookup of (p − k) counts subarrays ending HERE that sum to k.

The rolling pre-state shape is the most subtle hash-map idiom: the map's role is not to count items, but to count prior states such that current − prior matches the target. Internalise this and you unlock dozens of "count subarrays with property X" variants.

8 · Walkthrough: Group Anagrams (LC 49)

LC 49 demonstrates the "group by canonical key" sub-pattern in its cleanest form. Given a list of strings, group them so that anagrams (same letters, possibly different order) end up together.

Build the intuition: design the key

  • What property do anagrams share? The same multiset of letters. Two strings are anagrams iff their multisets are equal.
  • Two natural canonical forms. (a) Sort the letters → string key. (b) Count letters → fixed-size array key. Which is asymptotically faster, and by how much?
  • In Go specifically: why [26]int not []int? Map keys must be comparable. Arrays are; slices are not.
  • What if the input alphabet isn't a–z? Unicode? Use the sorted-string approach, or a wider fixed-size array indexed by code-point.

Go scaffold: two flavours

// Flavour 1: sorted-string key — O(n × k log k)
func groupAnagramsSorted(strs []string) [][]string {
    groups := make(map[string][]string)
    for _, s := range strs {
        // TODO: sort the bytes of s, use the sorted form as the key
        // TODO: append s to groups[key]
    }
    out := make([][]string, 0, len(groups))
    for _, g := range groups {
        out = append(out, g)
    }
    return out
}

// Flavour 2: count-array key — O(n × k); arrays are comparable in Go
func groupAnagramsCount(strs []string) [][]string {
    groups := make(map[[26]int][]string)
    for _, s := range strs {
        var key [26]int
        // TODO: fill key by counting each letter of s

        groups[key] = append(groups[key], s)
    }
    // ... return as before
    return nil // TODO
}

The canonical-key trick generalises. Anywhere you can ask "what's the normalised form of this thing such that two equivalent inputs produce the same form?", you can group with a hash map. Examples: numbers grouped by their digit-sum; strings grouped by their pattern (LC 290); points grouped by their slope from a fixed origin (LC 149).

9 · Hard walkthrough: Minimum Window Substring (LC 76)

LC 76 is the canonical "hash map plus sliding window" hard problem. It demonstrates the most subtle hash-map shape: using a counting map as the condition for a window's validity, with a second integer (have) caching the satisfaction state to avoid re-checking the whole map on every step.

Problem. Given strings s and t, return the shortest substring of s that contains every character of t (counting multiplicities). Empty string if no such window exists.

Input:  s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"
Explanation: every other window containing A, B, and C is longer.

Why brute force is too slow

Brute force: enumerate every (left, right) pair, check if the substring contains all characters of t. O(n² · |t|). With n, |t| ≤ 10⁵, that's 10¹⁵ operations, not even close. The sliding-window-plus-hashmap solution is O(n + |t|).

Build the intuition: the four moving parts

  • The need map. need[c] = how many copies of c the window still owes t. Built once from t.
  • The have map. have[c] = how many copies of c the current window contains, capped at need[c] (extras don't help).
  • The matched counter. A single integer counting how many of t's distinct chars are fully satisfied (i.e. have[c] == need[c]). When matched == len(need), the window is valid.
  • The window pointers. Expand right until valid; contract left while staying valid, recording the shortest. Each pointer moves O(n) times total, linear.

The matched-counter trick: why it's the right design

A naive implementation would re-compare have against need on every window change: O(|t|) per step, O(n |t|) total. The matched counter caches the answer to "are all keys satisfied?". Incrementing or decrementing it only when a single character flips from unsatisfied to satisfied (or vice versa) makes each window step O(1).

Go scaffold: fill in the four branches

func minWindow(s, t string) string {
    if len(s) < len(t) {
        return ""
    }
    need := make(map[byte]int)
    for i := 0; i < len(t); i++ {
        need[t[i]]++
    }
    required := len(need)              // # of distinct chars to satisfy

    have := make(map[byte]int)
    matched := 0                       // # of distinct chars currently satisfied
    bestLen := 1 << 30
    bestL := 0
    left := 0

    for right := 0; right < len(s); right++ {
        c := s[right]
        // EXPAND — incorporate s[right] into the window
        if _, ok := need[c]; ok {
            have[c]++
            // TODO: if have[c] == need[c], increment matched
        }

        // CONTRACT — while window is valid, try to shrink from the left
        for matched == required {
            // record best
            if right-left+1 < bestLen {
                bestLen = right - left + 1
                bestL = left
            }
            d := s[left]
            // TODO: if need[d] exists, decrement have[d]
            // TODO: if have[d] drops below need[d], decrement matched
            left++
        }
    }

    if bestLen == 1<<30 {
        return ""
    }
    return s[bestL : bestL+bestLen]
}

Trace through s = "ADOBECODEBANC", t = "ABC"

rights[right]have aftermatchedcontract?best window
0AA:11no
3BA:1, B:12no
5CA:1, B:1, C:13 ✓yes → "ADOBEC" (6)"ADOBEC"
10B(after contract) A:1, B:2, C:0matched dropped, then re-rises(no improvement)
12CA:1, B:1, C:1 (after contracts)3 ✓yes → "BANC" (4)"BANC"
The off-by-one trap. The condition to increment matched is have[c] == need[c], with equality. If have[c] overshoots need[c] (e.g. extra A's in the window), matched must not double-count. Symmetrically when contracting: only decrement matched when have[d] drops below need[d], not when it merely changes. Get one of these four conditions wrong and the algorithm returns garbage.
Why this is hash-map, not just sliding-window. The pattern is a sliding window around a hash-map invariant. The hash map encodes "what does a valid window look like"; the window pointers exploit it for O(n) traversal. Pure sliding-window without the hash map can't express "contains every char of t". It could only express conditions on a single character or sum. The combination unlocks the class of "find the smallest / largest substring with property X" problems.

9b · Bonus: Continuous Subarray Sum (LC 523)

A close cousin of LC 560 showing the prefix-mod variant of the rolling pre-state shape. Worth seeing because it teaches a small but important twist: when two indices share a prefix-sum modulo k, the subarray between them sums to a multiple of k.

Problem. Given an integer array nums and an integer k, return true if nums has a continuous subarray of size at least 2 whose elements sum to a multiple of k.

The identity

If prefix[j] mod k == prefix[i] mod k with j > i, then (prefix[j] − prefix[i]) mod k == 0, so nums[i+1..j] sums to a multiple of k. Map each modular residue to its earliest index; when you see the same residue again at index j, check that j − earliest ≥ 2.

Go scaffold

func checkSubarraySum(nums []int, k int) bool {
    // residue -> earliest index where this residue appeared
    first := map[int]int{0: -1}          // SEED: empty prefix has residue 0 at index -1
    prefix := 0

    for i, x := range nums {
        prefix += x
        r := prefix % k
        if r < 0 {                       // Go's % on negatives is signed; normalise
            r += k
        }
        if earliest, ok := first[r]; ok {
            if i-earliest >= 2 {
                return true
            }
        } else {
            first[r] = i                 // record only on first sighting
        }
    }
    return false
}
The "earliest index only" trick. Unlike LC 560 where we count all earlier matches, here we want the longest possible subarray for a fixed residue, so we only record the first time each residue appears. Recording later occurrences would only shrink the window. This is a different value shape (first-occurrence index, shape D) on the same rolling-state skeleton.
The −1 seed. Without first[0] = -1, a subarray starting at index 0 with sum divisible by k would be missed: there'd be no earlier "empty prefix" residue to match against. Same trick as LC 560's cnt[0] = 1, different shape.

10 · How to solve hard hash-map problems step-by-step

The recipe that collapses 90% of "feels like hash map but I'm stuck" into a clean solve. Run through the four questions in order; if any answer is unclear, slow down; the bug is in the design, not the code.

  1. What is the key? Possibilities: the raw value, a normalised key (sorted-letters string), a signature (count tuple), a derived pair (slope + intercept), a state (running prefix sum). The hardest hash-map problems are hard because the right key is non-obvious. LC 49 = letter-count tuple; LC 149 = slope as gcd-rationalised fraction; LC 560 = running prefix sum.
  2. What is the value? Possibilities: a count, an index, a list of items (group-by), an accumulator (running min / max), a node pointer (LRU). For complement-lookup, value = index. For counting, value = count. For group-by, value = list. For first-occurrence, value = index. Wrong choice here breaks the next step.
  3. When do you insert? When do you update? The "check-then-insert" mantra of Two Sum is the canonical example: flip the order and the algorithm breaks on duplicate inputs. For prefix-sum problems, always increment after looking up (otherwise a sum of 0 counts itself). For LRU, every operation both updates the map and reorders the list.
  4. One-pass or two-pass? One-pass is the goal, but some problems need a precomputation (LC 1 with hash) or two phases (Hopcroft-Karp). When in doubt, code the two-pass version first; collapse to one-pass after the algorithm works.
The hard problem decision tree. "Find pair / triple summing to X" → complement lookup (or sort + two pointers). "Find subarray with property X" → rolling pre-state (prefix sum / xor / mod). "Group equivalent items" → canonical signature as key. "Constant-time lookup with recency order" → hash map plus linked list. "Constant-time random sample plus delete" → hash map plus array (LC 380). Naming the family is half the work; the rest is filling in the key and value.

11 · Sister data structures

Hash map is the default. But several variants, and several entirely different data structures, fill specific gaps. Knowing when to reach past the hash map is a senior signal.

StructureLookupWhat it adds over hash mapWhen to reach for it
Sorted map (TreeMap / std::map / BTreeMap)O(log n)Ordered iteration; "smallest key ≥ X" in O(log n)Time-based KV (LC 981), range queries, finding the next event after time T.
LinkedHashMap (Java) / OrderedDict (Python)O(1)Insertion-order iterationLRU implementation in 5 lines (Python's OrderedDict.move_to_end is the LRU primitive); deterministic JSON output.
BiMap (Guava) / dual mapsO(1) both waysLook up by key or by valueBijection problems (LC 290 Word Pattern, LC 205 Isomorphic Strings, LC 890). Cheaper than maintaining two maps and keeping them in sync.
Multimap (Guava) / map[K][]VO(1)Multiple values per key, with bulk operationsGroup-by patterns where the bucket itself needs operations (sort, sum, min-heap).
Bloom filterO(k) probesMemory ≪ hash set; tolerates false positives"Have I probably seen this URL?" at billion-element scale. Pre-filter before an expensive lookup. Used in databases (Cassandra), browsers (Chrome's Safe Browsing).
Count-Min SketchO(d)Approximate counts in sub-linear memory"Top-K frequent items" in a stream where you can't fit the full frequency table. Used in network monitoring, ad-tech, search analytics.
HyperLogLogO(1)Cardinality estimation in ~1.5 KB"How many distinct users today?" at petabyte scale. Redis's PFCOUNT.
Consistent hash ringO(log n)Stable assignment when nodes are added/removedDistributed caches, sharded databases. Adds 1/n keys on resize instead of all of them.
TrieO(|key|)Prefix iteration; shared prefixes share memoryWord search (LC 212), autocomplete, longest-common-prefix queries. Trie pattern covers the details.
The probabilistic trade. Bloom filter, Count-Min Sketch, and HyperLogLog all sacrifice exactness for memory. They never give false negatives on "no" answers, but they can lie in the "yes" direction (Bloom) or under/overestimate (CMS, HLL). They show up in system-design rounds much more than LC; for the latter, hash map and sorted map cover everything.

When the hash map is the wrong tool

  • You need range queries. "Smallest key ≥ X" / "all keys in [a, b]": hash maps can't do these; sorted map is the move. Time-based key-value store (LC 981) is the canonical example.
  • You need O(1) extra space. Constraint kills hash map. Reach for two pointers, in-place mutation, or bit-manipulation. LC 136 Single Number is the textbook: hash-map solves it in O(n) time + O(n) space; XOR over the array does it in O(n) + O(1).
  • You need worst-case O(1). Hash map is average-case O(1); worst-case O(n). For hard real-time systems, reach for cuckoo hashing (guaranteed O(1) lookup) or a sorted-tree map (guaranteed O(log n)).
  • You need persistent / functional access. Hash array mapped tries (HAMT), such as Clojure's PersistentHashMap and Scala's immutable map, keep previous versions of the map alive with structural sharing. Each "update" returns a new map in O(log n) without copying.

12 · LRU cache: hash map + doubly-linked list (LC 146)

LC 146 is the canonical "two structures composed" hash-map question. Implement a fixed-capacity cache with O(1) get and put. The structures are a hash map (key → node) and a doubly-linked list (most recently used at the front, least recently used at the tail).

Build the intuition: answer six design questions

  1. Why two structures, not one? Hash map: O(1) lookup by key. Linked list: O(1) recency reordering. Neither alone gets both.
  2. Which end of the list is "most recent"? Pick a convention and stick to it. Most implementations: head = most-recent, tail = least-recent.
  3. What does Get(key) do besides return the value? Move the accessed node to the most-recent side. Why? "Most recently used" should reflect access, not just insertion.
  4. What does Put(key, val) do on a key that's already there? Overwrite the value, AND move to most-recent. (Don't double-insert.)
  5. What does Put(key, val) do when at capacity? Evict the LRU (tail's predecessor), remove from the map, then insert.
  6. The sentinel trick: what does it buy you? Two dummy nodes (head, tail). Every real node has non-nil prev and next. No "is this the first / last node?" branches in remove or insert.

Go scaffold: fill in the four list ops

type lruNode struct {
    key, val   int
    prev, next *lruNode
}

type LRUCache struct {
    cap        int
    m          map[int]*lruNode
    head, tail *lruNode // sentinels: head = most-recent side, tail = LRU side
}

func NewLRUCache(capacity int) *LRUCache {
    head := &lruNode{}
    tail := &lruNode{}
    head.next = tail
    tail.prev = head
    return &LRUCache{
        cap:  capacity,
        m:    make(map[int]*lruNode),
        head: head,
        tail: tail,
    }
}

// Detach a node from its current position
func (c *LRUCache) remove(n *lruNode) {
    // TODO: splice n out of the doubly-linked list
}

// Insert a node right after head (most-recent position)
func (c *LRUCache) addFront(n *lruNode) {
    // TODO: insert between head and head.next
}

func (c *LRUCache) Get(key int) int {
    // TODO: if not present, return -1
    // TODO: move node to front, return its value
    return -1
}

func (c *LRUCache) Put(key, val int) {
    // TODO: if key exists, remove old node from list
    // TODO: else if at capacity, evict tail.prev (LRU) from list and map
    // TODO: create new node, addFront it, record in map
}
Test your remove and addFront on paper. Draw four nodes (sentinel head, A, B, sentinel tail). Walk through Get(B): remove(B) then addFront(B). Confirm the pointers end up correct. This is where most implementations fail: the splice operations are short but subtle.
Why both structures. The hash map gives O(1) lookup by key. The linked list gives O(1) recency reordering. Either alone won't work: a hash map can't track recency in O(1); a linked list can't look up by key in O(1). Composing them is the lever.

The two sentinel pattern

The sentinel head and tail nodes simplify edge cases. Without them, every insert/remove would need a "is this the first / last node?" check. With them, every real node has a real prev and next: no nulls to handle. The trick generalises to any doubly-linked list operation in interview-shaped problems.

12b · Hashing internals: what you should know

A senior interviewer occasionally asks "what's actually happening inside a hash map?". Two minutes of grounding to give a confident answer:

  • The hash function turns a key into an integer (typically 32 or 64 bits). For string keys, that's something like FNV-1a or MurmurHash. For integer keys, often the integer itself, possibly XOR'd with a constant.
  • The bucket index is the hash modulo the table size. Most implementations use a table size that's a power of 2, so modulo is a bitmask.
  • Collisions are unavoidable (pigeonhole). Two strategies: chaining (each bucket is a linked list of (key, value) pairs) or open addressing (probe forward to find an empty slot — linear probing, quadratic, or double hashing).
  • Load factor = entries ÷ buckets. When it exceeds a threshold (~0.75 in Java, ~0.6 in Go's map, varies elsewhere), the table is rehashed: a new table 2× larger is allocated, every entry is re-inserted. Rehash is O(n) but amortised O(1) per insert.
  • Worst case is O(n) when an adversary picks keys that all collide. Pre-2012 PHP, Python, and others used predictable hashes vulnerable to this denial-of-service. Modern languages use a per-process random seed (HashDoS mitigation).
  • Memory layout matters. Open-addressing tables (Robin Hood, Swiss Tables) are typically 2–3× faster than chaining because they're cache-friendly. Go's map and Rust's HashMap use chained-then-open-addressed hybrid designs.

For interview answers: state O(1) average for insert/lookup/delete, with a footnote about worst case and rehashing. Don't volunteer implementation depth unless asked; it derails the conversation. But when asked, a 30-second confident answer about chaining vs open addressing is a strong signal.

13 · Common pitfalls: what breaks

  • Inserting before checking. The Two Sum bug. Always check, then insert. [3, 3] with target 6 matches the same index against itself if you flip the order.
  • Forgetting the seed in prefix-sum maps. cnt = {0: 1} is the standard initialiser: without it, subarrays starting at index 0 are missed. Test on nums = [k] for any k; without the seed it returns 0.
  • Updating matched on the wrong comparison. The LC 76 trap. Increment when have[c] hits need[c] exactly; decrement when it drops below. Off-by-one on either side returns nonsense.
  • Hashing mutable types. In Python, lists aren't hashable; use tuples. In Java, mutating a key after insertion corrupts the map — the key still hashes to the old bucket, but equals compares against the new value, so lookups silently fail. Go forbids slices as keys for the same reason; arrays (fixed size) are allowed because they're value types.
  • Adversarial hash collisions. Pre-2012, predictable hashes made HashDoS attacks practical: a malicious client sends keys all hashing to one bucket, degrading every operation to O(n). Modern languages randomise the seed per process. Competitive programmers sometimes exploit this in C++ unordered_map on Codeforces, where the seed is fixed; use map (red-black tree) or a custom hash if you suspect.
  • Go map iteration order non-determinism. Go's runtime deliberately randomises iteration order on every range over a map. Don't write tests that depend on a specific order; don't write algorithms that depend on it either. If you need order, use a slice of keys you sort or a sorted map.
  • Iterating while mutating. Modifying a hash map during iteration over it usually throws (Python: RuntimeError; Java: ConcurrentModificationException; Go: undefined behaviour — sometimes works, sometimes panics, sometimes silently produces wrong results). Iterate over a snapshot of the keys, or stage changes in a list and apply after the loop.
  • Floating-point keys. Don't. Equal-by-value floats can hash differently due to NaN, signed zero, or representation drift through arithmetic. Round to integer or use a string representation with controlled precision.
  • Integer overflow in custom hash functions. Rolling hashes (Rabin-Karp), polynomial hashes for tuples, or any custom hash that multiplies and adds, will overflow 64-bit on long inputs. Use uint64 arithmetic and accept that hash collisions become possible — verify with the actual key on collision.
  • Off-by-one in counts. "How many anagrams" vs "how many groups": read the problem twice. The frequency-map shape is so common that off-by-one in the answer extraction is the most common bug.
  • Mutating a struct that's a map key. In Go, map[StructType]V is allowed for comparable structs, but if you take a value out, mutate it, and try to put it back at the same key, the key has changed. Use the original key (immutable copy) to update.
  • Counting beyond what's needed in sliding-window. In LC 76, increment have[c] unconditionally; only count it toward matched at the equality boundary. Capping have[c] at need[c] works too but obscures the algorithm.

14 · Complexity

OperationAverageWorst caseNotes
InsertO(1)O(n)Worst case is rare; requires adversarial keys.
LookupO(1)O(n)Same.
DeleteO(1)O(n)Same.
IterateO(n)O(n)No order guarantee in most languages.

In interview answers, state O(1) for hash-map ops with a footnote about worst case. Anything else over-engineers the answer.

15 · Variants & related patterns

  • Sorted-map (TreeMap, std::map). When you need ordered keys — "smallest key > X" in O(log n). Different structure, same family.
  • Sliding window. Most variable windows track membership via a hash map.
  • Two pointers. The O(1)-space alternative when you can sort first.
  • Trie. Specialised structure for prefix-keyed lookups; replaces hash map for word-search-style problems.
  • Bloom filter. When you need set-membership at scale and can tolerate false positives. Out of scope for LC, common in real systems.
  • Adjacent: consistent-hashing simulator. Distribution-of-keys-across-nodes — system-design-shaped, but builds the cost-model intuition.

16 · Where this gets asked

Hash-map is the most-asked pattern in interview history. If you only memorise one pattern, this is the one. Six companies and what they tend to reach for:

CompanyCommon framingWhy it fits their stack
GoogleTwo Sum (LC 1), Subarray Sum Equals K (LC 560), Longest Consecutive Sequence (LC 128)Indexing and ranking lookups — every hash is an inverted-index entry under the hood.
MetaGroup Anagrams (LC 49), Valid Anagram (LC 242), Top K Frequent Elements (LC 347)Feed-ranking dedup, content-signal hashing.
AmazonTwo Sum (LC 1) — quite literally the most-asked Amazon question, ever, First Unique Character (LC 387)The phone-screen warm-up of choice for L4 hires.
MicrosoftLRU Cache (LC 146), Insert/Delete/GetRandom O(1) (LC 380)Office's serialisation layer + Azure caching layer use these patterns directly.
AppleContains Duplicate (LC 217), Roman to Integer (LC 13), Isomorphic Strings (LC 205)Lookup-table problems are L4 staple at Apple Services.
Stripe / Cloudflare / DatadogRate-limiter (sliding-window counter via hash), idempotency-key dedup, request fingerprintingEvery payment / edge / monitoring API has a per-key hash .
Pattern of patterns. Four sub-shapes do almost all the work — (1) complement lookup (Two Sum), (2) seen-set deduplication (Contains Duplicate), (3) prefix-sum + hashmap (Subarray Sum Equals K), (4) hashmap + linked-list (LRU Cache). Master those four and you can implement any hash-map question from scratch.

17 · Try it yourself

Five problems, ordered by depth. Spend 30 minutes on each before checking the hint.

Two Sum · LC 1
The canonical hash-map warm-up. For each number, check if (target − num) is in the map; if not, add this num. Hint: walk the array once, building the map as you go — don't pre-populate it, you'd hit duplicate-index bugs on inputs like [3,3].
Group Anagrams · LC 49
Key by sorted characters; values are the buckets. Hint: in Python, defaultdict(list) avoids the boilerplate. The sorted-string key is O(k log k); a 26-element count tuple key is O(k) but uglier.
Subarray Sum Equals K · LC 560
Prefix sum + hashmap. Track the running sum; the count of subarrays ending here is the number of previous prefix sums equal to (running − k). Hint: seed the map with {0: 1} so subarrays starting at index 0 count correctly.
Longest Consecutive Sequence · LC 128
Set of all numbers, then start a streak only from numbers whose predecessor isn't present. Hint: O(n) is achievable because each number is visited at most twice — once as a "start" candidate, once as a member of a streak. Don't sort.
Stretch — LRU Cache · LC 146
Hashmap + doubly-linked list = O(1) get + O(1) put with LRU eviction. Hint: use sentinel head/tail nodes; this avoids the if/null checks at boundary cases and makes the code 30% shorter.
How to practise. First pass — write from scratch. Second pass — explain the invariant aloud while coding ("the map holds X seen-so-far"). Third pass — rewrite from memory. By the third pass the hashmap reflex is automatic; the only per-problem work is "what's the key, what's the value?"

Further reading

  • CLRS — Introduction to Algorithms, Chapter 11. Hash tables, collision resolution, universal hashing. The reference.
  • Sedgewick & Wayne — Algorithms (4th ed). Chapter 3.4 covers hash tables with separate chaining and linear probing.
  • Knuth — TAOCP Vol 3. The original analysis of hashing. Mathematical, dense, foundational.
  • Robin Hood hashing. A modern open-addressing variant with much better worst-case probe distance. Used in Rust's HashMap historically and in many competitive HFT codebases.
  • Adjacent: how hash tables work. The mechanism explainer on this site, with diagrams.
  • Adjacent: two pointers. The other Tier A pattern; sometimes the alternative.
Found this useful?