Bit manipulation
Niche, but when it lands, usually the cleanest solution. XOR's
self-cancellation collapses paired elements to zero; bitmasks turn a small
set into an integer you can enumerate or memoise on; a handful of two-token
idioms (n & (n-1), n & -n) replace whole
loops. The recognition signals are narrow, but the wins on time and space
tend to be large.
1 · Why bit manipulation matters
Bits are the cheapest representation of state a computer has. A 64-bit
register holds an entire 64-element subset, and the CPU manipulates it
with a single AND, OR, or XOR
instruction — one machine cycle, no cache miss, no allocation. Nothing
else in the algorithmic toolkit is that fast.
Three properties make bit manipulation worth knowing even when the problem doesn't obviously call for it:
- Bits as set algebra. A 32-bit integer is a subset of
{0, 1, ... 31}. Union is|; intersection is&; symmetric difference is^; complement is~; membership is(x >> i) & 1. Set theory at machine speed. - XOR as algebra over GF(2). XOR is addition mod 2. It's commutative, associative, and self-inverse:
a ^ a = 0. That single algebraic property is what every "find the unique element" or "detect the swap" problem leans on. No hash map. No sort. One pass, one variable. - Constant-time tricks nothing else matches. Lowest set bit in O(1):
x & -x. Clear lowest set bit:x & (x-1). Power-of-two test:x > 0 && (x & (x-1)) == 0. Each replaces a loop with a two-token expression that the compiler maps directly to a single instruction.
This is why bit manipulation dominates two interview categories that look unrelated: low-level systems (Linux kernel, JVM internals, network protocols, compression codecs) and graphics (pixel formats, alpha blending, sprite collision masks). Both demand worst-case performance where the difference between O(n) and O(1) on a hot path multiplies across billions of invocations per second.
Production examples: where bit tricks pay rent
The pattern shows up in real production systems more often than its LeetCode frequency suggests. A short tour of the places it earns its keep. Each of these is "an interview hashmap solution would work but the real system uses bitmaps":
- Bloom filters. A bit array plus k hash functions. "Is x probably in the set?" answered in O(k) bit-test operations with zero false negatives and tunable false-positive rate. Used in Bigtable, Cassandra, HBase, Postgres to skip SSTables that definitely don't contain a key. The whole datastructure is a bitmap and a handful of
(hash >> i) & 1reads. - Roaring bitmaps in OLAP. Apache Druid, Apache Pinot, ClickHouse, and Elasticsearch all use Roaring bitmaps to represent which rows match a filter. A query like "WHERE country = 'IN' AND tier = 'premium'" reduces to an
ANDof two bitmaps, then a popcount to count the result rows. 64 row-membership tests per machine instruction. The original paper (Lemire 2016) showed 10-100× compression vs. integer arrays for sparse sets, and 4× faster aggregations than the array path. - Compact ACLs in network firewalls. Cisco's NX-OS, Juniper's Junos, and the Linux netfilter
nftablesbackend encode access-control rules as bitmaps over packet fields (src IP class, dst port range, protocol). Matching a packet against thousands of rules becomes a fewANDoperations over precomputed bitmaps, fast enough to run at line rate on a 100 Gbps interface. - File-system free-space bitmaps. ext4, XFS, NTFS, APFS: every modern filesystem tracks which blocks are free with a bitmap. "Allocate the next k contiguous free blocks" becomes a scan for k consecutive zero bits; the implementations use SWAR +
POPCNT+BSFinstructions to do this in cache-friendly 64-bit chunks. The ext4 mballoc code is a half-page of bit twiddling that handles 16 TB filesystems. - Network packet header parsing. Every TCP/IP, IPv6, and HTTP/3 (QUIC) header is a tightly packed bit-field structure. Linux's kernel networking stack uses macros like
FIELD_GET/FIELD_PREPthat compile down to a shift + mask. Zero allocations, zero branches; just shift, mask, and compare. - Compiler register allocation. Liveness analysis represents "which variables are live at this program point" as a bitset over the variable index. Set union (variable becomes live in either successor) is a bitmap OR; set difference (variable killed by an assignment) is AND-NOT. LLVM's register allocator, GCC's gimple pass, and the Go compiler's SSA pass all use this representation. Typical bitset widths are in the hundreds.
- Chess engine bitboards. Stockfish, Leela, every modern chess engine represents the board as twelve 64-bit integers (one bit per square, one bitmap per piece type per colour). "Find all squares attacked by white pawns" is a shift-and-OR. "Detect check" is a bitmap intersection. The de Bruijn trick for square indexing is what makes move generation fast enough to evaluate ~100 million positions per second per core.
- Compression codecs. Zstandard, LZ4, Snappy, Brotli all encode literal-vs-match flags as bits in a header byte, walked with shifts and masks. Huffman decoders use canonical tables indexed by the next N bits of the input stream. simdjson's tokeniser uses SWAR to find all
"and\bytes in a 64-byte chunk with five instructions.
2 · Intuition: every integer is a small array of booleans
A 32-bit integer is just 32 boolean values welded together. The CPU
reads, writes, and operates on all 32 at once — no loop, no array
bounds check. Once you see an int as a 32-element bit
array, the operations stop looking like cryptic symbols and start
looking like ordinary set algebra. 5 is the same thing as
[0, 0, 0, 0, 0, 1, 0, 1]; XOR is "pair-wise NOT-EQUAL";
AND is "pair-wise AND". That's all there is.
Once you see this, the four operations carry their meaning without
translation. x | y is set union — every bit set in either
input is set in the output. x & y is set intersection,
only bits set in both. x ^ y is symmetric difference,
bits set in exactly one. ~x is the complement under the
32- or 64-bit universe. Test "is element i in the set?" with
(x >> i) & 1. Add element i with
x | (1 << i). Remove with x & ~(1 << i).
Toggle with x ^ (1 << i).
3 · How to recognise it
Bit manipulation is rarely the first tool to reach for, but the signals are distinctive. Ten recognisable shapes recur across decades of interview problems; if any of these match the prompt, spend a minute thinking in bits before defaulting to a hash set or sort.
Shape A: state as a bitmask (subset enumeration)
- The input size is
n ≤ 20–25, with phrasing like "all subsets", "all permutations", or "every combination". - Each subset becomes an integer in
[0, 2^n). Bitiset means elementiis in the subset. - Enumerate with a single loop:
for mask := 0; mask < (1 << n); mask++. Often the state for DP.dp[mask]indexes a 2^n-sized table. - Canonical: LC 78 Subsets, LC 847 Shortest Path Visiting All Nodes, LC 1125 Smallest Sufficient Team. All are bitmask enumeration in disguise.
Shape B: find the missing or duplicate via XOR
- The prompt says "every element appears twice except one" or "every integer in [0, n] except one".
- XOR every element together. Pairs cancel (
x ^ x = 0); the loner survives. O(n) time, O(1) space. - Canonical: LC 136 Single Number, LC 268 Missing Number, LC 389 Find the Difference. Each is one line of code once you spot the framing.
Shape C: counting set bits (popcount)
- The prompt asks for the Hamming weight, "number of 1-bits", or related metric.
- Brian Kernighan's loop:
n &= n - 1clears the lowest set bit each iteration; the iteration count is the popcount. O(popcount(n)) rather than O(width). - Pre-computed popcount:
dp[i] = dp[i >> 1] + (i & 1). O(n) preprocessing, O(1) lookup: the LC 338 DP trick. - Canonical: LC 191 Number of 1 Bits, LC 338 Counting Bits, LC 461 Hamming Distance, LC 477 Total Hamming Distance.
Shape D: lowest set bit isolation (x & -x)
- The prompt involves Fenwick trees (BIT) or "find the rightmost set bit".
x & -xisolates the lowest set bit by exploiting two's-complement:-x = ~x + 1, so all bits below the lowest set bit flip and propagate. - Used in the Fenwick / Binary Indexed Tree: the index update sequence is
i += i & -i(climb the implicit binary tree by jumping to the next responsible node). - Submask enumeration: iterate every subset of a mask with
for sub := mask; sub > 0; sub = (sub - 1) & mask. Total work across all masks is O(3^n), useful for sum-over-subsets DP.
Shape E: parity / Gray code
- The prompt asks for "odd or even count", "balance the parities", or "binary-reflected sequence".
- Parity: XOR collapses any sequence to a single parity bit. Pairing arguments often reduce to parity invariants. See the chessboard "can a knight reach this square" family.
- Gray code: successive values differ in exactly one bit. The encoding is
g(n) = n ^ (n >> 1); the decoding accumulates XOR. Used in mechanical encoders, Karnaugh maps, and LC 89.
Shape F: constant-time arithmetic tricks
- Multiply / divide by powers of two.
x << kis multiply by 2^k;x >> kis divide by 2^k (rounded toward −∞ for arithmetic shift, toward 0 for logical shift on positive values). - Fast mod for powers of two.
x % (1 << k)equalsx & ((1 << k) - 1). Hash tables routinely size their bucket count as a power of two for exactly this reason. - Sign extension. Promoting a k-bit signed value to a wider type:
(n << (W - k)) >> (W - k)with arithmetic shift. Used everywhere in codec / decoder logic. - Swap without a temp.
a ^= b; b ^= a; a ^= b. Cute, rarely useful in modern code (compilers optimise the temp away), but interviewers occasionally ask for the derivation as an XOR-algebra warm-up.
Shape G: bitmask DP for subset enumeration (n ≤ 20)
- Tell. Small
n(typically 15–22), phrasing like "optimal ordering", "minimum cost to cover all", "best subset", or "visit every X exactly once". The brute force isO(n!); the bitmask version isO(n · 2^n)— astronomically smaller at n = 20. - State design.
dp[mask][i]= best result when the subset of items represented bymaskhas been processed and the last one chosen wasi. Transitions: pick the next un-set bit, OR it in, recurse. - Memory budget.
2^20 = ~10^6states fits comfortably in memory;2^22 = ~4·10^6is the practical ceiling at most problem time limits. Beyond 24, bitmask DP doesn't fit. - Canonical: LC 847 Shortest Path Visiting All Nodes (TSP-ish), LC 1125 Smallest Sufficient Team, LC 1986 Minimum Number of Work Sessions. Every "visit all and minimise" problem with small n.
Shape H: state compression (pack multiple flags into one int)
- Tell. The problem has several small enumerated dimensions — say, a 4-state direction, a 2-state has-key flag, and a 32-state visited bitmap. Stuffing them into a single integer key turns multi-dimensional hashmap memoisation into an array lookup.
- Encoding. Allocate bit ranges per field:
key = direction | (hasKey << 2) | (visited << 3). Decoding is shift + mask. Saves the per-key allocation of a tuple or struct. - When it pays off. Memoised BFS where the state has 3–4 small components, and the total state space is a few million. Replacing
map[stateKey]intwith[N]intis typically 5–10× faster because of cache behaviour. - Canonical: LC 864 Shortest Path to Get All Keys (visited keys as a bitmask), LC 691 Stickers to Spell Word (target-letter mask), state-compressed grid-path DPs.
Shape I: XOR for "find the unique" / "find the missing"
- Tell. The prompt enumerates a structured set (0..n, pairs, triples) and says "one is missing", "one appears once, the others twice", "two appear once, the others twice", or "find which two were swapped".
- The lever. XOR is self-inverse and commutative, so XORing a structured multiset against an actual multiset cancels everything that matches and leaves only the differences. Often one variable, one pass, no extra memory.
- The two-unique trick (LC 260). XOR everything → you get
a ^ bwhere a and b are the two uniques. Find any set bit of that result, then partition the array on that bit and XOR each half independently. Each half contains exactly one unique. - Canonical: LC 136 Single Number, LC 137 Single Number II, LC 260 Single Number III, LC 268 Missing Number, LC 389 Find the Difference, LC 1486 XOR Operation in an Array.
Shape J: popcount-based scoring / parity reasoning
- Tell. The objective is "count of items with property X", or "number of differences between two states", or "Hamming distance between configurations". The natural representation is a bitmap and the objective is popcount on it.
- The win. Hardware
POPCNT(x86, ARM, RISC-V) does 64-bit popcount in one cycle. Go'smath/bits.OnesCount64compiles to it. A loop overnentries collapses ton/64popcounts plus a horizontal sum. - Monotonicity. Popcount is monotone non-decreasing along the lattice of "set i to 1". Some DP problems (LC 338 Counting Bits) exploit the recurrence
popcount(2k) = popcount(k)andpopcount(2k+1) = popcount(k) + 1. A single shift + low-bit reveals the whole bit count. - Canonical: LC 477 Total Hamming Distance, LC 1356 Sort Integers by Number of 1 Bits, LC 1310 XOR Queries (combined with prefix XOR).
a ^ a = 0 and a ^ 0 = a. That single algebraic
property is what every "appears twice except one", "find missing number",
and "find the swapped pair" problem leans on. Half of the six shapes
above are XOR specialisations.4 · Sister algorithms
Bit manipulation sits at the intersection of low-level systems programming and bit-twiddling tricks. Five sister techniques are worth knowing by name. They're what you reach for when basic bit operations don't quite close the problem.
- Brian Kernighan's bit count. The fundamental popcount loop:
while x > 0 { x &= x - 1; count++ }. Each iteration clears exactly one set bit, so the loop runs popcount(x) times — much faster than scanning all 32 (or 64) bits when the population is sparse. Named after the same Kernighan of K&R fame; the trick predates him but he popularised it in The C Programming Language. - de Bruijn sequences for log2. Computing
log2(x)whenxis a power of two (equivalently, finding the index of the only set bit) looks like it needs a loop. A de Bruijn sequence reduces it to O(1): multiplyxby a magic 64-bit constant, shift right by 58, look up the result in a 64-entry table. The constant is chosen so that every 6-bit window of the shifted product is unique. Used in chess engines for bitboard square indexing, in Java'sInteger.numberOfTrailingZeros, and in any code that needs to find the lowest set bit's position fast without a hardwareBSFinstruction. - SWAR (SIMD Within A Register). Treating a 64-bit integer as eight 8-bit lanes and operating on all of them in parallel with bit tricks. The classic example: counting set bits in 64 bits with five constant-time operations using nibble-swap and parallel addition, the algorithm Hacker's Delight calls
popcount_3. Modern CPUs have a dedicatedPOPCNTinstruction that beats SWAR, but the technique generalises: parallel comparison, parallel addition, branchless string parsing in JSON tokenisers (simdjson) all use SWAR. - Bitmask DP. When the problem says "n ≤ 20" and asks for "the optimal ordering" or "the cheapest subset", the state is usually a bitmask of which elements have been used so far. Travelling Salesman (LC 943, LC 847), Smallest Sufficient Team (LC 1125), Partition to K Equal Sum Subsets (LC 698). The DP table is
dp[mask]with size2^n; transitions iterate over which element to add next. See the advanced DP page for the full treatment. - Bit tries (binary trie). A trie indexed bit-by-bit rather than character-by-character. Stores 32 bits per integer in 32 levels; supports "maximum XOR with any stored value" in O(32) time per query by greedily picking the opposite bit at each level. Used in routing tables (IPv4 prefix matching) and in LC 421 Maximum XOR of Two Numbers in an Array.
BSF, reach for de Bruijn. If it asks for an optimisation
over subsets with n ≤ 20, reach for bitmask DP. If it asks for "max
XOR pair", reach for a bit trie. The choice is rarely between two of
these. The problem shape picks one cleanly.5 · Canonical example: Single Number (LC 136)
Problem. Given a non-empty array where every element appears twice except for one, find that one. Required: linear time, constant extra space.
Input: [4, 1, 2, 1, 2]
Output: 4
Explanation: 1 and 2 each appear twice; 4 is the odd one out.XOR every element together. Pairs cancel (x ^ x = 0),
order doesn't matter (XOR is commutative and associative), and the lone
element ends up XORed with zero — which is itself. The hash-set solution
is O(n) time and O(n) space; this is O(n) time and O(1) space, and the
code is one line.
6 · Reference implementation
Single Number first, then three companion snippets that cover the rest of the working vocabulary.
from functools import reduce
from operator import xor
def single_number(nums: list[int]) -> int:
return reduce(xor, nums, 0)
# or, explicitly:
# acc = 0
# for n in nums: acc ^= n
# return accCounting set bits (Brian Kernighan). Each iteration clears the lowest set bit. Runs in O(popcount(n)) rather than O(width).
def popcount(n: int) -> int:
count = 0
while n:
n &= n - 1 # clears the lowest set bit
count += 1
return countIs power of two. A power of two has exactly one set bit. n & (n-1) clears that bit, leaving zero.
def is_power_of_two(n: int) -> bool:
return n > 0 and (n & (n - 1)) == 0Subset enumeration. Every integer in [0, 2^n) is a subset; bit i set means element i is included.
def all_subsets(elems: list) -> list[list]:
n = len(elems)
out = []
for mask in range(1 << n):
out.append([elems[i] for i in range(n) if mask & (1 << i)])
return outmask & (1 << i)
needs the inner parentheses in C, Java, Go, and most C-family languages —
& has lower precedence than ==, so
mask & 1 << i == 0 parses in surprising ways. When in
doubt, parenthesise.7 · Variations
Seven idioms that cover most bit-manipulation problems you'll see in an interview.
| Idiom | What it does | Where it shows up |
|---|---|---|
| XOR pair-cancellation | x ^ x = 0, x ^ 0 = x. Pairs collapse; the unpaired element survives. | LC 136 Single Number, LC 268 Missing Number |
| XOR find missing | XOR all values 0..n together with the array; the missing index falls out. | LC 268 Missing Number, LC 287 Find the Duplicate (variant) |
| Brian Kernighan popcount | n &= n - 1 clears the lowest set bit. Loop count is the popcount. | LC 191 Number of 1 Bits, LC 461 Hamming Distance |
| Subset enumeration | for mask in range(1 << n). Each mask is one subset. | LC 78 Subsets, LC 698 Partition to K Equal Subsets |
| Bitmask DP | State is a bitmask of which elements have been used. dp[mask] memoises the best result so far. | LC 847 Shortest Path Visiting All Nodes, LC 1494 Parallel Courses II |
| Pre-computed popcount | dp[i] = dp[i >> 1] + (i & 1). O(n) preprocessing, O(1) lookup. | LC 338 Counting Bits |
| Lowest set bit | n & -n isolates the lowest set bit. Two's-complement makes this work. | Fenwick trees, LC 187 Repeated DNA (variant) |
7a · The bit tricks cheat sheet
Most bit-manipulation problems combine three or four of the operations below. Memorise the pattern — they recur across decades of problems from low-level systems code to LeetCode hards.
n by repeatedly doing n &= n-1 and
counting iterations. Each iteration clears exactly one set bit, so the
loop runs as many times as there are 1-bits — much faster than
scanning all 32 bits when the population count is low. This is the
"clear lowest bit" trick used as a counting primitive.7b · The bit operations cheat table
| Operation | Expression | Use case |
|---|---|---|
| Set bit i | n | (1 << i) | Mark item i in subset bitmask |
| Clear bit i | n & ~(1 << i) | Remove item i |
| Toggle bit i | n ^ (1 << i) | Flip state — used in subsets enumeration |
| Check bit i | (n >> i) & 1 | Is item i in the set? |
| Lowest set bit isolated | n & -n | Fenwick tree, find rightmost set bit |
| Clear lowest set bit | n & (n - 1) | Count bits (Brian Kernighan); power-of-2 test |
| Right-propagate lowest bit | n | (n - 1) | Round up to next 2^k − 1 |
| Power of 2? | n > 0 && (n & (n - 1)) == 0 | Two-line check |
| XOR swap | a ^= b; b ^= a; a ^= b | No-temp swap (cute, rarely useful) |
| Count subsets of mask | for sub = mask; sub; sub = (sub - 1) & mask | Iterate every subset of a mask in O(3^n) |
| Sign extension | (n << (32 - k)) >> (32 - k) | Sign-extend a k-bit signed value |
| Absolute value (branchless) | (n ^ (n >> 31)) - (n >> 31) | Avoid mispredicted branch in tight loop |
8 · Five-problem progression
| # | LC | Problem | Why it's here |
|---|---|---|---|
| 1 | LC 136 | Single Number | The XOR-cancellation trick in its purest form. One line of code, O(1) space. |
| 2 | LC 191 | Number of 1 Bits | Brian Kernighan's n & (n-1) loop. The simplest popcount. |
| 3 | LC 338 | Counting Bits | Compute popcount for every i in [0, n]. The DP recurrence is dp[i] = dp[i >> 1] + (i & 1) — shift drops the low bit, the parity adds it back. |
| 4 | LC 78 | Subsets | Bitmask enumeration. 2^n masks, build each subset by checking which bits are set. |
| 5 | LC 421 | Maximum XOR of Two Numbers | Bit-by-bit greedy via a bit trie. Where bit manipulation hands off to a more structured data structure. |
9 · Worked example: LC 191 Number of 1 Bits (Brian Kernighan)
Problem. Given an unsigned integer n,
return the number of set bits (the Hamming weight). The naive solution
scans all 32 bits in a loop; the Brian Kernighan trick scans only as
many bits as are actually set.
Input: n = 11 (binary: 1011)
Output: 3
Input: n = 128 (binary: 10000000)
Output: 1The trick: why n & (n-1) clears one bit at a time
Subtracting 1 from n flips the lowest set bit to 0 and
turns every bit below it from 0 to 1. ANDing with the original
n kills both: the lowest set bit (it's 0 in n-1)
and everything below it (those bits are 0 in n). Result: a
version of n with exactly one fewer set bit. Loop until
n == 0; the iteration count is the popcount.
// LC 191 — Brian Kernighan
func hammingWeight(n uint32) int {
count := 0
for n != 0 {
n &= n - 1 // clear the lowest set bit
count++
}
return count
}
// Time: O(popcount(n)) — at most 32 iterations on a uint32
// Space: O(1)
// Alternative — using the hardware POPCNT instruction (Go std lib)
import "math/bits"
func hammingWeightFast(n uint32) int {
return bits.OnesCount32(n)
}
// Time: O(1) — single machine instruction on x86/ARM/RISC-V with POPCNTTrace: n = 13 = 0b1101
| iter | n (binary) | n - 1 (binary) | n & (n-1) | count after |
|---|---|---|---|---|
| 1 | 1101 | 1100 | 1100 | 1 |
| 2 | 1100 | 1011 | 1000 | 2 |
| 3 | 1000 | 0111 | 0000 | 3 |
| 4 | 0000 | — | — | loop exits |
for i := 0; i < 32; i++ {...}. On sparse
bitmaps in compiler liveness analysis or filesystem free-space scans,
that 10× speed-up is the whole point of using a bitmap.10 · Worked example: LC 78 Subsets (bitmask enumeration)
Problem. Given an array nums of distinct
integers, return all 2^n subsets (the power set).
Recursion-based backtracking works; the bitmask version is shorter,
loop-only, and the same complexity.
Input: nums = [1, 2, 3]
Output: [[], [1], [2], [1,2], [3], [1,3], [2,3], [1,2,3]]
Each subset corresponds to a 3-bit mask:
000 = [] 100 = [3]
001 = [1] 101 = [1,3]
010 = [2] 110 = [2,3]
011 = [1,2] 111 = [1,2,3]The trick: every mask is a subset
Iterate every integer from 0 to 2^n - 1. For
each, bit i set means element i is in the
subset. Build the subset by checking each bit; emit it. Two nested
loops, no recursion.
// LC 78 — Subsets via bitmask enumeration
func subsets(nums []int) [][]int {
n := len(nums)
out := make([][]int, 0, 1<<n)
for mask := 0; mask < (1 << n); mask++ {
sub := []int{}
for i := 0; i < n; i++ {
if mask&(1<<i) != 0 {
sub = append(sub, nums[i])
}
}
out = append(out, sub)
}
return out
}
// Time: O(n · 2^n) — 2^n masks, n work to materialise each subset
// Space: O(n · 2^n) for the output
// Alternative — iterate bits with Brian Kernighan (faster on sparse subsets)
func subsetsFast(nums []int) [][]int {
n := len(nums)
out := make([][]int, 0, 1<<n)
for mask := 0; mask < (1 << n); mask++ {
sub := []int{}
m := mask
for m != 0 {
i := bits.TrailingZeros(uint(m)) // index of lowest set bit
sub = append(sub, nums[i])
m &= m - 1 // clear it
}
out = append(out, sub)
}
return out
}Submask iteration: a related trick
Sometimes you need every subset of a given mask rather than
every subset of {0..n-1}. The idiom:
for sub := mask; sub > 0; sub = (sub - 1) & mask.
It walks every submask in decreasing order, no duplicates, no out-of-
mask bits. Total work across all masks of a size-n universe is
O(3^n) — the textbook "sum over subsets" complexity.
// Iterate every submask of mask, including mask itself but excluding 0
for sub := mask; sub > 0; sub = (sub - 1) & mask {
// process sub
}
// Include 0: change the loop to "for { ... if sub == 0 { break } ... }"
// Use case — SOS DP (sum over subsets):
// For each mask, sum f[sub] over every sub ⊆ mask.
// Naive: O(4^n). With this iteration: O(3^n). With layered SOS: O(n · 2^n).11 · Worked example: LC 691 Stickers to Spell Word (HARD)
Problem. Given an array of sticker strings and a
target word, return the minimum number of stickers (with repetition
allowed) whose letters can be rearranged to spell the target. Return
-1 if impossible. Target length is up to 15.
Input: stickers = ["with","example","science"], target = "thehat"
Output: 3
Explanation: "with" + "example" + "example" gives w,i,t,h,e,x,a,m,p,l,e,e,x,a,m,p,l,e
— enough to spell "thehat".Why this is a bitmask problem
- Target length ≤ 15. That single constraint is the giveaway.
2^15 = 32768— a state space small enough to enumerate. Encode "which target letters still need to be covered" as a 15-bit mask. - State.
dp[mask]= minimum stickers needed whenmaskis the set of target letter positions still uncovered. Goal:dp[(1<<n)-1]; answer atdp[0]. - Transition. For each sticker, compute the subset of uncovered letters this sticker can fill (greedily, position by position). OR-NOT it out of the current mask. The new mask gets one sticker added to the cost.
- BFS over masks. Treat masks as nodes; an edge exists from
masktonextMaskif some sticker reducesmasktonextMask. The minimum sticker count is the shortest path fromfullmask to0.
Go scaffold: BFS over bitmask states
func minStickers(stickers []string, target string) int {
n := len(target)
full := (1 << n) - 1
// BFS: each layer is one additional sticker
dp := make([]int, 1<<n)
for i := range dp {
dp[i] = -1
}
dp[0] = 0
for mask := 0; mask <= full; mask++ {
if dp[mask] == -1 {
continue
}
// Try every sticker. For each, compute new mask after greedy fill.
for _, sticker := range stickers {
newMask := mask
// Count letters available in sticker
avail := [26]int{}
for _, c := range sticker {
avail[c-'a']++
}
// For each uncovered target position, cover if letter is available
for i := 0; i < n; i++ {
if newMask&(1<<i) == 0 && avail[target[i]-'a'] > 0 {
newMask |= 1 << i
avail[target[i]-'a']--
}
}
// Update if we made progress
if newMask != mask && (dp[newMask] == -1 || dp[newMask] > dp[mask]+1) {
dp[newMask] = dp[mask] + 1
}
}
}
return dp[full]
}
// Time: O(2^n · |stickers| · L) where L is max sticker length
// Space: O(2^n)12 · How to solve hard bit-manipulation problems, step-by-step
Hard bit problems are rarely about inventing a new trick. They're about recognising which of the four signals the problem is sending, then applying the matching template. Four diagnostic questions, in order:
- Is there a small
n(≤ 20) suggesting bitmask DP? The number 20 (or 15, or 22) in the constraints is the loudest signal in this pattern.2^nstates fit in memory; transitions iterate over which bit to flip. If you see "n ≤ 20" and "optimal ordering / cover all / visit each once", reach for bitmask DP before anything else. LC 847, LC 691, LC 1125 all live here. - Is there an XOR-cancellation property? The prompt mentions "every element appears twice except", "exactly one is missing from 0..n", "find the one that's been swapped". XOR everything and let the structure cancel — usually one line of code, O(1) space. If the structure isn't pair-wise, sometimes a higher-order trick works: LC 137 (every element appears three times except one) uses two bitmask accumulators that track "seen once" and "seen twice" mod 3.
- Can multi-state be packed into bits? The problem has several small enumerated dimensions (direction × has-key × visited-set). Encoding them as bit ranges in a single integer turns a multi-key memo into an array lookup — 5–10× faster in cache and 0 allocations. The encoding is
key = a | (b << bitsA) | (c << (bitsA + bitsB)); the decoding is shift + mask. - Is there popcount monotonicity? Some DPs proceed by popcount layer: process all masks with popcount k before any with popcount k+1. This is natural when transitions only add bits (subset enumeration, "fill in one more"). The order is: sort masks by popcount, fill in that order; this is what enables the SOS DP family in
O(n · 2^n)rather thanO(3^n).
13 · Common pitfalls
- Signed vs unsigned integer width. Python's ints are unbounded; C, Java, and Go fix integers at 32 or 64 bits. A popcount loop that works in Python can wedge in Java if the input is negative — Java's
intis signed, andn >> 1sign-extends. Usen >>> 1for logical shift, or mask off the high bit. - Operator precedence.
&,|, and^have lower precedence than==and!=in the C family.x & 1 == 0parses asx & (1 == 0). Always parenthesise:(x & 1) == 0. - Forgetting that
n & (n-1)clears the lowest set bit. Easy to misremember as "clears the lowest bit" (i.e. bit 0). It clears the lowest set bit, wherever that happens to be, which is what makes Kernighan's loop work. - Arithmetic vs logical shift in signed languages. Right-shifting a negative number with
>>in Java or C preserves the sign bit (arithmetic shift). For unsigned semantics use>>>in Java or cast to unsigned in C. Python and most dynamic languages dodge this entirely. - Subset enumeration on n > 30.
2^30is roughly a billion masks — too many. Bitmask DP only works when n is small (usually ≤ 20–22 for tight time limits, ≤ 25 for loose ones). - Confusing XOR with addition. XOR is addition mod 2 without carry.
1 + 1 = 2;1 ^ 1 = 0. Useful when you want pairs to cancel; wrong when you want a sum. - Signed-vs-unsigned shift in Go vs Java. Go has no
>>>operator; signedness is a property of the type, not the operation.int32(-1) >> 1in Go is-1(arithmetic shift, sign-extends);uint32(0xFFFFFFFF) >> 1is0x7FFFFFFF(logical shift, fills with 0). In Java,>>is arithmetic and>>>is logical regardless of operand type. If you're porting Java bit code to Go, convert touint32/uint64before the shift, then back if needed. - Off-by-one in bit-position indexing. Bit 0 is the lowest bit (worth 1, not 2). A 32-bit int has bits 0 through 31, not 1 through 32.
1 << 32on a Goint32is undefined (compile-time error in Go 1.17+ when the shift amount is a constant), and on dynamic shifts produces0or wraps. The most-significant bit of auint32is1 << 31, not1 << 32. - Overflow when shifting by ≥ word size. In Go,
uint32(1) << 32is0at runtime (the shift amount is masked to 5 bits for uint32). In C/C++, shifting by ≥ width is undefined behaviour — your compiler is free to produce0, the original value, or set your house on fire. Always check the shift amount:if k >= 32 { return 0 }. - The
int32(1) << 31overflow gotcha. In Go,int32(1) << 31is-2147483648(the sign bit is set, producing INT_MIN). If your code path uses this as a "max bit" sentinel and later adds 1 to it, you wrap to-2147483647rather than overflow-trap. Useuint32(1) << 31when you actually want the value2147483648. The same trap exists atint64(1) << 63. - Forgetting to handle
n = 0in Brian Kernighan loops.n & (n-1)whenn == 0is0 & (-1) = 0in two's-complement, which is fine, but the loop conditionfor n != 0already excludes this. The bug is when you writefor n > 0on a signed type and pass a negative input. The loop never runs becausenis already ≤ 0.
14 · What to say in the interview
- "XOR / bitmask → bit manipulation." Name the recognition signal. "Each element appears twice except one — that's XOR cancellation." Or: "n is small and we need all subsets — that's a bitmask."
- State the specific trick. "XOR is its own inverse, so pairs cancel." "Brian Kernighan's loop clears the lowest set bit in one step." "Bit i of the mask means element i is included." The trick is short; saying it shows you know why the code works.
- Call out the space win. The bit-manipulation solution to Single Number is O(1) space versus O(n) for the hash-set approach. Same for Missing Number. Interviewers usually want to hear you recognise this.
- Talk about width. "In Java I'd use
>>>for logical shift". A small thing, but it signals you've written bit code in a real language and not just Python. - Don't overreach. If the obvious solution is fine and bit manipulation is a marginal win, say so. "Hash set is O(n) space but reads more clearly; XOR is O(1) space if we need it." Trade-offs over verdicts.
15 · Where this gets asked
| Company | Common framing | Why it fits their stack |
|---|---|---|
| Single Number (LC 136, 137, 260), Sum of Two Integers without + (LC 371), Counting Bits (LC 338) | The XOR trick + bit-DP show up at every level. Google L4 phone screens favour Single Number variants. | |
| Meta | Subsets (LC 78 via bitmask), Number of Steps to Reduce a Number to Zero (LC 1342) | Subset enumeration via bitmask is the "show you can think in bits" interview test. |
| Amazon | Number of 1 Bits (LC 191), Reverse Bits (LC 190), Bitwise AND of Numbers Range (LC 201) | The bit-fundamentals show up in AWS storage layout questions + binary-protocol parsing. |
| Microsoft | Power of Two (LC 231), Hamming Distance (LC 461), Total Hamming Distance (LC 477) | The classic n & (n-1) tricks. Used in spell-check distance metrics + fuzzy match. |
| Apple | Maximum XOR of Two Numbers (LC 421 via bit trie), Number Complement (LC 476) | Bit-trie XOR shows up routinely. Apple's compression + cryptography work uses these primitives. |
| Bloomberg / Citadel | Maximum XOR Subarray (variant of LC 421), Subsets via bitmask DP, Travelling Salesman bitmask DP | Bitmask DP for ≤ 20 nodes shows up in HFT route-optimisation interviews routinely. |
n & (n-1) clears the lowest set bit (LC 191, 231), (3) bitmask as set membership for n ≤ 20 (LC 78, TSP), (4) bit trie for XOR maximisation (LC 421), (5) bit-DP where state is "subset of items used so far" (TSP, Smallest Sufficient Team). Knowing all five covers ~95% of bit-manipulation problems.16 · Try it yourself
- Single Number · LC 136
- The XOR cancellation drill. Hint:
a ^ a = 0anda ^ 0 = a. XOR every number; the duplicates cancel; the unique one remains. O(n) time, O(1) space: the clean solution. - Number of 1 Bits · LC 191
- Two solutions worth knowing. Hint: naive, shift right 32 times, count 1s. Better,
n &= n - 1clears the lowest set bit each iteration; count how many iterations until n = 0. Best, Brian Kernighan's algorithm. - Counting Bits · LC 338
- The bit-DP shortcut. Hint:
bits[i] = bits[i >> 1] + (i & 1). Each number's bit count equals the count of its right-shift plus the lowest bit. O(n) total, not O(n log n). - Sum of Two Integers Without + · LC 371
- Half-adder logic in software. Hint:
a ^ b= sum without carry;(a & b) << 1= carry. Loop while carry ≠ 0. Mind language-specific signed integer behaviour; Python needs a 32-bit mask to handle negatives. - Stretch: Maximum XOR of Two Numbers · LC 421
- Bit trie. Hint: insert each number bit-by-bit (MSB first) into a 32-level trie. For each number, walk the trie greedily picking the OPPOSITE bit at each level when available. That bit XORs to 1, maximising the result.
f"{n:032b}" in Python shows you what's actually happening. Most bit bugs come from misreading the bit order or forgetting that integers are 32 (or 64) bits in your language. The print is cheap; the confusion is expensive.Further reading
- Hacker's Delight (Henry S. Warren). The reference. Every bit trick you've heard of and many you haven't, with derivations. Worth owning if you work close to the metal.
- Wikipedia — Bit manipulation. Decent overview of the idioms; good index of where each one shows up in practice.
- Sean Anderson — Bit Twiddling Hacks. Stanford's curated page. Popcount variants, lowest-set-bit, sign-extension: usually the first hit when you search for any specific trick.
- Adjacent: Dynamic programming. Bitmask DP lives at the intersection. Once the state is a bitmask, the rest is regular DP.
- Adjacent: Hash map. The no-bit alternative for most single-number and missing-number problems. Easier to read, O(n) extra space: usually the right call unless space is tight.
- Adjacent: Trie. The bit-trie is the bridge from bit manipulation to structured data: LC 421 (Maximum XOR) is the canonical example.