04 / 07
OOD / 04

Design a hash map

Bucket array + collision handling + a rehash policy. The interview tests three things: the hash → bucket math, how collisions are resolved, and when (and how) the table grows. Most candidates skip the third; it's the differentiator.


1 · Clarifying questions

Operations?put(k, v), get(k), remove(k), contains(k), size(). Generic over K and V.
Average complexity?O(1) for all of the above; O(n) worst-case under a bad hash.
Memory budget?Discuss. Open addressing is more cache-friendly; chaining is simpler to reason about.
Initial capacity?Default 16; power of two so hash mod n can be a bitwise AND.
Thread-safe?No, by default. Mention ConcurrentHashMap-style striping as an extension.

2 · Entities

  • Entry<K, V> — holds a key, value, the key's cached hash, and a next pointer for chaining.
  • HashMap<K, V> — owns the bucket array, the load factor, the size.

3 · Separate-chaining sketch

class Entry<K, V>:
  hash: int
  key: K
  value: V
  next: Entry | None

class HashMap<K, V>:
  DEFAULT_CAPACITY = 16
  LOAD_FACTOR = 0.75

  buckets: list[Entry | None]   # length = power of two
  size: int = 0

  def __init__(capacity=16):
    self.buckets = [None] * capacity

  def _index(key):
    h = hash(key)
    # spread: mix high bits down so weak hashes still scatter
    h ^= (h >>> 16)
    return h & (len(self.buckets) - 1)   # power-of-two AND

  def put(k, v):
    i = self._index(k)
    e = self.buckets[i]
    while e is not None:
      if e.key == k:
        e.value = v
        return
      e = e.next
    new_entry = Entry(hash(k), k, v, self.buckets[i])
    self.buckets[i] = new_entry
    self.size += 1
    if self.size > LOAD_FACTOR * len(self.buckets):
      self._rehash(len(self.buckets) * 2)

  def get(k):
    i = self._index(k)
    e = self.buckets[i]
    while e is not None:
      if e.key == k: return e.value
      e = e.next
    return None

  def remove(k):
    i = self._index(k)
    prev, e = None, self.buckets[i]
    while e is not None:
      if e.key == k:
        if prev: prev.next = e.next
        else:    self.buckets[i] = e.next
        self.size -= 1
        return e.value
      prev, e = e, e.next
    return None

  def _rehash(new_cap):
    old = self.buckets
    self.buckets = [None] * new_cap
    self.size = 0
    for head in old:
      while head is not None:
        nxt = head.next
        i = head.hash & (new_cap - 1)
        head.next = self.buckets[i]
        self.buckets[i] = head
        head = nxt
    # size will be re-counted as we re-put? No — keep size accurate by
    # incrementing in the loop. Or, store size = sum-of-buckets and skip rebuild.

Key details: the hash is cached on the entry (don't recompute during rehash), and the spread step (h ^= h >>> 16) is the same trick Java's HashMap uses. It prevents a hash that only varies in its high bits from collapsing to one bucket when you AND with a small mask.

4 · Chaining vs open addressing

AspectSeparate chainingOpen addressing (linear probe)
Bucket holdsHead of a linked listThe entry itself, inline
Load factor target~0.75~0.5 (any higher and probes grow)
Memory overheadExtra pointer per entryNone
Cache localityWorse (chase pointers)Better (probe walks contiguous memory)
Delete isEasy — unlinkHarder — must tombstone
Worst caseO(n) chain length under adversarial keysO(n) cluster length under poor hash

Open addressing tends to win in modern hardware (cache effects dominate). Java's HashMap chains; Python's dict uses open addressing. Either is a defensible answer if you explain the trade.

5 · Rehash

Triggered when size > load_factor × capacity. Double capacity, re-bucket every entry. Cost: O(n) once per amortised O(1) insert; the amortised analysis is a standard exam answer.

Tricks worth mentioning:

  • Power-of-two capacity means a bit-shift instead of modulo, and rehashing only needs to look at one extra bit per key to know its new bucket.
  • Incremental rehashing (Redis-style) spreads the O(n) cost across the next many operations, so no single put stalls. Real production hash tables do this.
  • Treeified buckets. When a chain gets long (≥8), Java converts it to a red-black tree. Defends against hash-collision DoS attacks.

6 · Edge cases

  • Mutable key. If the key changes after insertion, its hash changes; you'll lose track of it. Document that keys must be immutable, or use a defensive copy.
  • Null key / value. Allowed? Java allows one null key. Be explicit.
  • Hash collision attack. All keys hashing to one bucket → O(n) per operation, DoS the server. Defence: random hash seed per process (Python's PYTHONHASHSEED), or treeify under contention.
  • Iteration during mutation. Standard library maps fail-fast on modification during iteration; reasonable to do the same.

7 · Concurrency

Two real strategies:

  • Stripe locks. N independent locks, key → lock by hash. Java's pre-Java-8 ConcurrentHashMap. Concurrency proportional to N stripes.
  • CAS on bucket head. Java 8+ ConcurrentHashMap uses fine-grained CAS plus tree fallback. Lock-free in the fast path.

8 · What gets graded

  • Did you mention load factor and rehash?
  • Did you pick a collision strategy and explain the trade?
  • Did you handle remove correctly (esp. tombstones if you went open addressing)?
  • Did you flag the hash-collision attack and the mitigation?
Found this useful?