Design a hash map
Bucket array + collision handling + a rehash policy. The interview tests three things: the hash → bucket math, how collisions are resolved, and when (and how) the table grows. Most candidates skip the third; it's the differentiator.
1 · Clarifying questions
| Operations? | put(k, v), get(k), remove(k), contains(k), size(). Generic over K and V. |
| Average complexity? | O(1) for all of the above; O(n) worst-case under a bad hash. |
| Memory budget? | Discuss. Open addressing is more cache-friendly; chaining is simpler to reason about. |
| Initial capacity? | Default 16; power of two so hash mod n can be a bitwise AND. |
| Thread-safe? | No, by default. Mention ConcurrentHashMap-style striping as an extension. |
2 · Entities
Entry<K, V>— holds a key, value, the key's cached hash, and anextpointer for chaining.HashMap<K, V>— owns the bucket array, the load factor, the size.
3 · Separate-chaining sketch
class Entry<K, V>:
hash: int
key: K
value: V
next: Entry | None
class HashMap<K, V>:
DEFAULT_CAPACITY = 16
LOAD_FACTOR = 0.75
buckets: list[Entry | None] # length = power of two
size: int = 0
def __init__(capacity=16):
self.buckets = [None] * capacity
def _index(key):
h = hash(key)
# spread: mix high bits down so weak hashes still scatter
h ^= (h >>> 16)
return h & (len(self.buckets) - 1) # power-of-two AND
def put(k, v):
i = self._index(k)
e = self.buckets[i]
while e is not None:
if e.key == k:
e.value = v
return
e = e.next
new_entry = Entry(hash(k), k, v, self.buckets[i])
self.buckets[i] = new_entry
self.size += 1
if self.size > LOAD_FACTOR * len(self.buckets):
self._rehash(len(self.buckets) * 2)
def get(k):
i = self._index(k)
e = self.buckets[i]
while e is not None:
if e.key == k: return e.value
e = e.next
return None
def remove(k):
i = self._index(k)
prev, e = None, self.buckets[i]
while e is not None:
if e.key == k:
if prev: prev.next = e.next
else: self.buckets[i] = e.next
self.size -= 1
return e.value
prev, e = e, e.next
return None
def _rehash(new_cap):
old = self.buckets
self.buckets = [None] * new_cap
self.size = 0
for head in old:
while head is not None:
nxt = head.next
i = head.hash & (new_cap - 1)
head.next = self.buckets[i]
self.buckets[i] = head
head = nxt
# size will be re-counted as we re-put? No — keep size accurate by
# incrementing in the loop. Or, store size = sum-of-buckets and skip rebuild.Key details: the hash is cached on the entry (don't recompute during rehash), and the spread step (h ^= h >>> 16) is the same trick Java's HashMap uses. It prevents a hash that only varies in its high bits from collapsing to one bucket when you AND with a small mask.
4 · Chaining vs open addressing
| Aspect | Separate chaining | Open addressing (linear probe) |
|---|---|---|
| Bucket holds | Head of a linked list | The entry itself, inline |
| Load factor target | ~0.75 | ~0.5 (any higher and probes grow) |
| Memory overhead | Extra pointer per entry | None |
| Cache locality | Worse (chase pointers) | Better (probe walks contiguous memory) |
| Delete is | Easy — unlink | Harder — must tombstone |
| Worst case | O(n) chain length under adversarial keys | O(n) cluster length under poor hash |
Open addressing tends to win in modern hardware (cache effects dominate). Java's HashMap chains; Python's dict uses open addressing. Either is a defensible answer if you explain the trade.
5 · Rehash
Triggered when size > load_factor × capacity. Double capacity, re-bucket every entry. Cost: O(n) once per amortised O(1) insert; the amortised analysis is a standard exam answer.
Tricks worth mentioning:
- Power-of-two capacity means a bit-shift instead of modulo, and rehashing only needs to look at one extra bit per key to know its new bucket.
- Incremental rehashing (Redis-style) spreads the O(n) cost across the next many operations, so no single
putstalls. Real production hash tables do this. - Treeified buckets. When a chain gets long (≥8), Java converts it to a red-black tree. Defends against hash-collision DoS attacks.
6 · Edge cases
- Mutable key. If the key changes after insertion, its hash changes; you'll lose track of it. Document that keys must be immutable, or use a defensive copy.
- Null key / value. Allowed? Java allows one null key. Be explicit.
- Hash collision attack. All keys hashing to one bucket → O(n) per operation, DoS the server. Defence: random hash seed per process (Python's
PYTHONHASHSEED), or treeify under contention. - Iteration during mutation. Standard library maps fail-fast on modification during iteration; reasonable to do the same.
7 · Concurrency
Two real strategies:
- Stripe locks. N independent locks, key → lock by hash. Java's pre-Java-8 ConcurrentHashMap. Concurrency proportional to N stripes.
- CAS on bucket head. Java 8+ ConcurrentHashMap uses fine-grained CAS plus tree fallback. Lock-free in the fast path.
8 · What gets graded
- Did you mention load factor and rehash?
- Did you pick a collision strategy and explain the trade?
- Did you handle
removecorrectly (esp. tombstones if you went open addressing)? - Did you flag the hash-collision attack and the mitigation?