16 / 20 · Tier C

Patterns / 16

Bit manipulation

Niche, but when it lands, usually the cleanest solution. XOR's self-cancellation collapses paired elements to zero; bitmasks turn a small set into an integer you can enumerate or memoise on; a handful of two-token idioms (n & (n-1), n & -n) replace whole loops. The recognition signals are narrow, but the wins on time and space tend to be large.

1 · Why bit manipulation matters

Bits are the cheapest representation of state a computer has. A 64-bit register holds an entire 64-element subset, and the CPU manipulates it with a single AND, OR, or XOR instruction — one machine cycle, no cache miss, no allocation. Nothing else in the algorithmic toolkit is that fast.

Three properties make bit manipulation worth knowing even when the problem doesn't obviously call for it:

Bits as set algebra. A 32-bit integer is a subset of {0, 1, ... 31}. Union is |; intersection is &; symmetric difference is ^; complement is ~; membership is (x >> i) & 1. Set theory at machine speed.
XOR as algebra over GF(2). XOR is addition mod 2. It's commutative, associative, and self-inverse: a ^ a = 0. That single algebraic property is what every "find the unique element" or "detect the swap" problem leans on. No hash map. No sort. One pass, one variable.
Constant-time tricks nothing else matches. Lowest set bit in O(1): x & -x. Clear lowest set bit: x & (x-1). Power-of-two test: x > 0 && (x & (x-1)) == 0. Each replaces a loop with a two-token expression that the compiler maps directly to a single instruction.

This is why bit manipulation dominates two interview categories that look unrelated: low-level systems (Linux kernel, JVM internals, network protocols, compression codecs) and graphics (pixel formats, alpha blending, sprite collision masks). Both demand worst-case performance where the difference between O(n) and O(1) on a hot path multiplies across billions of invocations per second.

Production examples: where bit tricks pay rent

The pattern shows up in real production systems more often than its LeetCode frequency suggests. A short tour of the places it earns its keep. Each of these is "an interview hashmap solution would work but the real system uses bitmaps":

Bloom filters. A bit array plus k hash functions. "Is x probably in the set?" answered in O(k) bit-test operations with zero false negatives and tunable false-positive rate. Used in Bigtable, Cassandra, HBase, Postgres to skip SSTables that definitely don't contain a key. The whole datastructure is a bitmap and a handful of (hash >> i) & 1 reads.
Roaring bitmaps in OLAP. Apache Druid, Apache Pinot, ClickHouse, and Elasticsearch all use Roaring bitmaps to represent which rows match a filter. A query like "WHERE country = 'IN' AND tier = 'premium'" reduces to an AND of two bitmaps, then a popcount to count the result rows. 64 row-membership tests per machine instruction. The original paper (Lemire 2016) showed 10-100× compression vs. integer arrays for sparse sets, and 4× faster aggregations than the array path.
Compact ACLs in network firewalls. Cisco's NX-OS, Juniper's Junos, and the Linux netfilter nftables backend encode access-control rules as bitmaps over packet fields (src IP class, dst port range, protocol). Matching a packet against thousands of rules becomes a few AND operations over precomputed bitmaps, fast enough to run at line rate on a 100 Gbps interface.
File-system free-space bitmaps. ext4, XFS, NTFS, APFS: every modern filesystem tracks which blocks are free with a bitmap. "Allocate the next k contiguous free blocks" becomes a scan for k consecutive zero bits; the implementations use SWAR + POPCNT + BSF instructions to do this in cache-friendly 64-bit chunks. The ext4 mballoc code is a half-page of bit twiddling that handles 16 TB filesystems.
Network packet header parsing. Every TCP/IP, IPv6, and HTTP/3 (QUIC) header is a tightly packed bit-field structure. Linux's kernel networking stack uses macros like FIELD_GET / FIELD_PREP that compile down to a shift + mask. Zero allocations, zero branches; just shift, mask, and compare.
Compiler register allocation. Liveness analysis represents "which variables are live at this program point" as a bitset over the variable index. Set union (variable becomes live in either successor) is a bitmap OR; set difference (variable killed by an assignment) is AND-NOT. LLVM's register allocator, GCC's gimple pass, and the Go compiler's SSA pass all use this representation. Typical bitset widths are in the hundreds.
Chess engine bitboards. Stockfish, Leela, every modern chess engine represents the board as twelve 64-bit integers (one bit per square, one bitmap per piece type per colour). "Find all squares attacked by white pawns" is a shift-and-OR. "Detect check" is a bitmap intersection. The de Bruijn trick for square indexing is what makes move generation fast enough to evaluate ~100 million positions per second per core.
Compression codecs. Zstandard, LZ4, Snappy, Brotli all encode literal-vs-match flags as bits in a header byte, walked with shifts and masks. Huffman decoders use canonical tables indexed by the next N bits of the input stream. simdjson's tokeniser uses SWAR to find all " and \ bytes in a 64-byte chunk with five instructions.

The senior-interview signal. Reaching for bit manipulation when it's appropriate isn't about cleverness — it's about recognising when state-as-bitmask compresses an O(n) scan into an O(1) machine instruction. Interviewers at infra-heavy teams (databases, storage, kernels, browsers, game engines) will route you through at least one bit-manipulation problem to see whether you reach for it naturally or default to a hash set.

2 · Intuition: every integer is a small array of booleans

A 32-bit integer is just 32 boolean values welded together. The CPU reads, writes, and operates on all 32 at once — no loop, no array bounds check. Once you see an int as a 32-element bit array, the operations stop looking like cryptic symbols and start looking like ordinary set algebra. 5 is the same thing as [0, 0, 0, 0, 0, 1, 0, 1]; XOR is "pair-wise NOT-EQUAL"; AND is "pair-wise AND". That's all there is.

Once you see this, the four operations carry their meaning without translation. x | y is set union — every bit set in either input is set in the output. x & y is set intersection, only bits set in both. x ^ y is symmetric difference, bits set in exactly one. ~x is the complement under the 32- or 64-bit universe. Test "is element i in the set?" with (x >> i) & 1. Add element i with x | (1 << i). Remove with x & ~(1 << i). Toggle with x ^ (1 << i).

The mental model that unlocks everything else. Every "clever" bit trick on this page is an obvious operation on a small boolean array, dressed in two-token syntax. Bitmask DP, popcount, Brian Kernighan, XOR cancellation: they're all consequences of "an int is a set of booleans, and the CPU operates on the whole set in one cycle". Re-read each trick with that lens; the cryptic-symbol feeling goes away.

3 · How to recognise it

Bit manipulation is rarely the first tool to reach for, but the signals are distinctive. Ten recognisable shapes recur across decades of interview problems; if any of these match the prompt, spend a minute thinking in bits before defaulting to a hash set or sort.

Shape A: state as a bitmask (subset enumeration)

The input size is n ≤ 20–25, with phrasing like "all subsets", "all permutations", or "every combination".
Each subset becomes an integer in [0, 2^n). Bit i set means element i is in the subset.
Enumerate with a single loop: for mask := 0; mask < (1 << n); mask++. Often the state for DP. dp[mask] indexes a 2^n-sized table.
Canonical: LC 78 Subsets, LC 847 Shortest Path Visiting All Nodes, LC 1125 Smallest Sufficient Team. All are bitmask enumeration in disguise.

Shape B: find the missing or duplicate via XOR

The prompt says "every element appears twice except one" or "every integer in [0, n] except one".
XOR every element together. Pairs cancel (x ^ x = 0); the loner survives. O(n) time, O(1) space.
Canonical: LC 136 Single Number, LC 268 Missing Number, LC 389 Find the Difference. Each is one line of code once you spot the framing.

Shape C: counting set bits (popcount)

The prompt asks for the Hamming weight, "number of 1-bits", or related metric.
Brian Kernighan's loop: n &= n - 1 clears the lowest set bit each iteration; the iteration count is the popcount. O(popcount(n)) rather than O(width).
Pre-computed popcount: dp[i] = dp[i >> 1] + (i & 1). O(n) preprocessing, O(1) lookup: the LC 338 DP trick.
Canonical: LC 191 Number of 1 Bits, LC 338 Counting Bits, LC 461 Hamming Distance, LC 477 Total Hamming Distance.

Shape D: lowest set bit isolation (x & -x)

The prompt involves Fenwick trees (BIT) or "find the rightmost set bit". x & -x isolates the lowest set bit by exploiting two's-complement: -x = ~x + 1, so all bits below the lowest set bit flip and propagate.
Used in the Fenwick / Binary Indexed Tree: the index update sequence is i += i & -i (climb the implicit binary tree by jumping to the next responsible node).
Submask enumeration: iterate every subset of a mask with for sub := mask; sub > 0; sub = (sub - 1) & mask. Total work across all masks is O(3^n), useful for sum-over-subsets DP.

Shape E: parity / Gray code

The prompt asks for "odd or even count", "balance the parities", or "binary-reflected sequence".
Parity: XOR collapses any sequence to a single parity bit. Pairing arguments often reduce to parity invariants. See the chessboard "can a knight reach this square" family.
Gray code: successive values differ in exactly one bit. The encoding is g(n) = n ^ (n >> 1); the decoding accumulates XOR. Used in mechanical encoders, Karnaugh maps, and LC 89.

Shape F: constant-time arithmetic tricks

Multiply / divide by powers of two. x << k is multiply by 2^k; x >> k is divide by 2^k (rounded toward −∞ for arithmetic shift, toward 0 for logical shift on positive values).
Fast mod for powers of two. x % (1 << k) equals x & ((1 << k) - 1). Hash tables routinely size their bucket count as a power of two for exactly this reason.
Sign extension. Promoting a k-bit signed value to a wider type: (n << (W - k)) >> (W - k) with arithmetic shift. Used everywhere in codec / decoder logic.
Swap without a temp. a ^= b; b ^= a; a ^= b. Cute, rarely useful in modern code (compilers optimise the temp away), but interviewers occasionally ask for the derivation as an XOR-algebra warm-up.

Shape G: bitmask DP for subset enumeration (n ≤ 20)

Tell. Small n (typically 15–22), phrasing like "optimal ordering", "minimum cost to cover all", "best subset", or "visit every X exactly once". The brute force is O(n!); the bitmask version is O(n · 2^n) — astronomically smaller at n = 20.
State design. dp[mask][i] = best result when the subset of items represented by mask has been processed and the last one chosen was i. Transitions: pick the next un-set bit, OR it in, recurse.
Memory budget. 2^20 = ~10^6 states fits comfortably in memory; 2^22 = ~4·10^6 is the practical ceiling at most problem time limits. Beyond 24, bitmask DP doesn't fit.
Canonical: LC 847 Shortest Path Visiting All Nodes (TSP-ish), LC 1125 Smallest Sufficient Team, LC 1986 Minimum Number of Work Sessions. Every "visit all and minimise" problem with small n.

Shape H: state compression (pack multiple flags into one int)

Tell. The problem has several small enumerated dimensions — say, a 4-state direction, a 2-state has-key flag, and a 32-state visited bitmap. Stuffing them into a single integer key turns multi-dimensional hashmap memoisation into an array lookup.
Encoding. Allocate bit ranges per field: key = direction | (hasKey << 2) | (visited << 3). Decoding is shift + mask. Saves the per-key allocation of a tuple or struct.
When it pays off. Memoised BFS where the state has 3–4 small components, and the total state space is a few million. Replacing map[stateKey]int with [N]int is typically 5–10× faster because of cache behaviour.
Canonical: LC 864 Shortest Path to Get All Keys (visited keys as a bitmask), LC 691 Stickers to Spell Word (target-letter mask), state-compressed grid-path DPs.

Shape I: XOR for "find the unique" / "find the missing"

Tell. The prompt enumerates a structured set (0..n, pairs, triples) and says "one is missing", "one appears once, the others twice", "two appear once, the others twice", or "find which two were swapped".
The lever. XOR is self-inverse and commutative, so XORing a structured multiset against an actual multiset cancels everything that matches and leaves only the differences. Often one variable, one pass, no extra memory.
The two-unique trick (LC 260). XOR everything → you get a ^ b where a and b are the two uniques. Find any set bit of that result, then partition the array on that bit and XOR each half independently. Each half contains exactly one unique.
Canonical: LC 136 Single Number, LC 137 Single Number II, LC 260 Single Number III, LC 268 Missing Number, LC 389 Find the Difference, LC 1486 XOR Operation in an Array.

Shape J: popcount-based scoring / parity reasoning

Tell. The objective is "count of items with property X", or "number of differences between two states", or "Hamming distance between configurations". The natural representation is a bitmap and the objective is popcount on it.
The win. Hardware POPCNT (x86, ARM, RISC-V) does 64-bit popcount in one cycle. Go's math/bits.OnesCount64 compiles to it. A loop over n entries collapses to n/64 popcounts plus a horizontal sum.
Monotonicity. Popcount is monotone non-decreasing along the lattice of "set i to 1". Some DP problems (LC 338 Counting Bits) exploit the recurrence popcount(2k) = popcount(k) and popcount(2k+1) = popcount(k) + 1. A single shift + low-bit reveals the whole bit count.
Canonical: LC 477 Total Hamming Distance, LC 1356 Sort Integers by Number of 1 Bits, LC 1310 XOR Queries (combined with prefix XOR).

Why XOR is the workhorse. XOR is its own inverse: a ^ a = 0 and a ^ 0 = a. That single algebraic property is what every "appears twice except one", "find missing number", and "find the swapped pair" problem leans on. Half of the six shapes above are XOR specialisations.

4 · Sister algorithms

Bit manipulation sits at the intersection of low-level systems programming and bit-twiddling tricks. Five sister techniques are worth knowing by name. They're what you reach for when basic bit operations don't quite close the problem.

Brian Kernighan's bit count. The fundamental popcount loop: while x > 0 { x &= x - 1; count++ }. Each iteration clears exactly one set bit, so the loop runs popcount(x) times — much faster than scanning all 32 (or 64) bits when the population is sparse. Named after the same Kernighan of K&R fame; the trick predates him but he popularised it in The C Programming Language.
de Bruijn sequences for log2. Computing log2(x) when x is a power of two (equivalently, finding the index of the only set bit) looks like it needs a loop. A de Bruijn sequence reduces it to O(1): multiply x by a magic 64-bit constant, shift right by 58, look up the result in a 64-entry table. The constant is chosen so that every 6-bit window of the shifted product is unique. Used in chess engines for bitboard square indexing, in Java's Integer.numberOfTrailingZeros, and in any code that needs to find the lowest set bit's position fast without a hardware BSF instruction.
SWAR (SIMD Within A Register). Treating a 64-bit integer as eight 8-bit lanes and operating on all of them in parallel with bit tricks. The classic example: counting set bits in 64 bits with five constant-time operations using nibble-swap and parallel addition, the algorithm Hacker's Delight calls popcount_3. Modern CPUs have a dedicated POPCNT instruction that beats SWAR, but the technique generalises: parallel comparison, parallel addition, branchless string parsing in JSON tokenisers (simdjson) all use SWAR.
Bitmask DP. When the problem says "n ≤ 20" and asks for "the optimal ordering" or "the cheapest subset", the state is usually a bitmask of which elements have been used so far. Travelling Salesman (LC 943, LC 847), Smallest Sufficient Team (LC 1125), Partition to K Equal Sum Subsets (LC 698). The DP table is dp[mask] with size 2^n; transitions iterate over which element to add next. See the advanced DP page for the full treatment.
Bit tries (binary trie). A trie indexed bit-by-bit rather than character-by-character. Stores 32 bits per integer in 32 levels; supports "maximum XOR with any stored value" in O(32) time per query by greedily picking the opposite bit at each level. Used in routing tables (IPv4 prefix matching) and in LC 421 Maximum XOR of Two Numbers in an Array.

The composition rule. If a problem asks for popcount of many values, precompute with the LC 338 DP. If it asks for lowest-set-bit position fast and you don't have hardware BSF, reach for de Bruijn. If it asks for an optimisation over subsets with n ≤ 20, reach for bitmask DP. If it asks for "max XOR pair", reach for a bit trie. The choice is rarely between two of these. The problem shape picks one cleanly.

5 · Canonical example: Single Number (LC 136)

Problem. Given a non-empty array where every element appears twice except for one, find that one. Required: linear time, constant extra space.

Input:  [4, 1, 2, 1, 2]
Output: 4
Explanation: 1 and 2 each appear twice; 4 is the odd one out.

XOR every element together. Pairs cancel (x ^ x = 0), order doesn't matter (XOR is commutative and associative), and the lone element ends up XORed with zero — which is itself. The hash-set solution is O(n) time and O(n) space; this is O(n) time and O(1) space, and the code is one line.

6 · Reference implementation

Single Number first, then three companion snippets that cover the rest of the working vocabulary.

from functools import reduce
from operator import xor

def single_number(nums: list[int]) -> int:
    return reduce(xor, nums, 0)
    # or, explicitly:
    # acc = 0
    # for n in nums: acc ^= n
    # return acc

Counting set bits (Brian Kernighan). Each iteration clears the lowest set bit. Runs in O(popcount(n)) rather than O(width).

def popcount(n: int) -> int:
    count = 0
    while n:
        n &= n - 1     # clears the lowest set bit
        count += 1
    return count

Is power of two. A power of two has exactly one set bit. n & (n-1) clears that bit, leaving zero.

def is_power_of_two(n: int) -> bool:
    return n > 0 and (n & (n - 1)) == 0

Subset enumeration. Every integer in [0, 2^n) is a subset; bit i set means element i is included.

def all_subsets(elems: list) -> list[list]:
    n = len(elems)
    out = []
    for mask in range(1 << n):
        out.append([elems[i] for i in range(n) if mask & (1 << i)])
    return out

The parens matter. mask & (1 << i) needs the inner parentheses in C, Java, Go, and most C-family languages — & has lower precedence than ==, so mask & 1 << i == 0 parses in surprising ways. When in doubt, parenthesise.

7 · Variations

Seven idioms that cover most bit-manipulation problems you'll see in an interview.

Idiom	What it does	Where it shows up
XOR pair-cancellation	`x ^ x = 0`, `x ^ 0 = x`. Pairs collapse; the unpaired element survives.	LC 136 Single Number, LC 268 Missing Number
XOR find missing	XOR all values `0..n` together with the array; the missing index falls out.	LC 268 Missing Number, LC 287 Find the Duplicate (variant)
Brian Kernighan popcount	`n &= n - 1` clears the lowest set bit. Loop count is the popcount.	LC 191 Number of 1 Bits, LC 461 Hamming Distance
Subset enumeration	`for mask in range(1 << n)`. Each mask is one subset.	LC 78 Subsets, LC 698 Partition to K Equal Subsets
Bitmask DP	State is a bitmask of which elements have been used. `dp[mask]` memoises the best result so far.	LC 847 Shortest Path Visiting All Nodes, LC 1494 Parallel Courses II
Pre-computed popcount	`dp[i] = dp[i >> 1] + (i & 1)`. O(n) preprocessing, O(1) lookup.	LC 338 Counting Bits
Lowest set bit	`n & -n` isolates the lowest set bit. Two's-complement makes this work.	Fenwick trees, LC 187 Repeated DNA (variant)

7a · The bit tricks cheat sheet

Most bit-manipulation problems combine three or four of the operations below. Memorise the pattern — they recur across decades of problems from low-level systems code to LeetCode hards.

Brian Kernighan's algorithm. Count the number of set bits in n by repeatedly doing n &= n-1 and counting iterations. Each iteration clears exactly one set bit, so the loop runs as many times as there are 1-bits — much faster than scanning all 32 bits when the population count is low. This is the "clear lowest bit" trick used as a counting primitive.

7b · The bit operations cheat table

Operation	Expression	Use case
Set bit i	`n \| (1 << i)`	Mark item i in subset bitmask
Clear bit i	`n & ~(1 << i)`	Remove item i
Toggle bit i	`n ^ (1 << i)`	Flip state — used in subsets enumeration
Check bit i	`(n >> i) & 1`	Is item i in the set?
Lowest set bit isolated	`n & -n`	Fenwick tree, find rightmost set bit
Clear lowest set bit	`n & (n - 1)`	Count bits (Brian Kernighan); power-of-2 test
Right-propagate lowest bit	`n \| (n - 1)`	Round up to next 2^k − 1
Power of 2?	`n > 0 && (n & (n - 1)) == 0`	Two-line check
XOR swap	`a ^= b; b ^= a; a ^= b`	No-temp swap (cute, rarely useful)
Count subsets of mask	`for sub = mask; sub; sub = (sub - 1) & mask`	Iterate every subset of a mask in O(3^n)
Sign extension	`(n << (32 - k)) >> (32 - k)`	Sign-extend a k-bit signed value
Absolute value (branchless)	`(n ^ (n >> 31)) - (n >> 31)`	Avoid mispredicted branch in tight loop

8 · Five-problem progression

#	LC	Problem	Why it's here
1	LC 136	Single Number	The XOR-cancellation trick in its purest form. One line of code, O(1) space.
2	LC 191	Number of 1 Bits	Brian Kernighan's `n & (n-1)` loop. The simplest popcount.
3	LC 338	Counting Bits	Compute popcount for every i in `[0, n]`. The DP recurrence is `dp[i] = dp[i >> 1] + (i & 1)` — shift drops the low bit, the parity adds it back.
4	LC 78	Subsets	Bitmask enumeration. `2^n` masks, build each subset by checking which bits are set.
5	LC 421	Maximum XOR of Two Numbers	Bit-by-bit greedy via a bit trie. Where bit manipulation hands off to a more structured data structure.

9 · Worked example: LC 191 Number of 1 Bits (Brian Kernighan)

Problem. Given an unsigned integer n, return the number of set bits (the Hamming weight). The naive solution scans all 32 bits in a loop; the Brian Kernighan trick scans only as many bits as are actually set.

Input:  n = 11  (binary: 1011)
Output: 3

Input:  n = 128 (binary: 10000000)
Output: 1

The trick: why n & (n-1) clears one bit at a time

Subtracting 1 from n flips the lowest set bit to 0 and turns every bit below it from 0 to 1. ANDing with the original n kills both: the lowest set bit (it's 0 in n-1) and everything below it (those bits are 0 in n). Result: a version of n with exactly one fewer set bit. Loop until n == 0; the iteration count is the popcount.

// LC 191 — Brian Kernighan
func hammingWeight(n uint32) int {
    count := 0
    for n != 0 {
        n &= n - 1   // clear the lowest set bit
        count++
    }
    return count
}
// Time:  O(popcount(n)) — at most 32 iterations on a uint32
// Space: O(1)

// Alternative — using the hardware POPCNT instruction (Go std lib)
import "math/bits"

func hammingWeightFast(n uint32) int {
    return bits.OnesCount32(n)
}
// Time:  O(1) — single machine instruction on x86/ARM/RISC-V with POPCNT

Trace: n = 13 = 0b1101

iter	n (binary)	n - 1 (binary)	n & (n-1)	count after
1	1101	1100	1100	1
2	1100	1011	1000	2
3	1000	0111	0000	3
4	0000	—	—	loop exits

The win over the naive loop. A 32-bit integer with three set bits (most real-world flags-style integers) takes 3 iterations under Kernighan, 32 iterations under for i := 0; i < 32; i++ {...}. On sparse bitmaps in compiler liveness analysis or filesystem free-space scans, that 10× speed-up is the whole point of using a bitmap.

10 · Worked example: LC 78 Subsets (bitmask enumeration)

Problem. Given an array nums of distinct integers, return all 2^n subsets (the power set). Recursion-based backtracking works; the bitmask version is shorter, loop-only, and the same complexity.

Input:  nums = [1, 2, 3]
Output: [[], [1], [2], [1,2], [3], [1,3], [2,3], [1,2,3]]

Each subset corresponds to a 3-bit mask:
  000 = []          100 = [3]
  001 = [1]         101 = [1,3]
  010 = [2]         110 = [2,3]
  011 = [1,2]       111 = [1,2,3]

The trick: every mask is a subset

Iterate every integer from 0 to 2^n - 1. For each, bit i set means element i is in the subset. Build the subset by checking each bit; emit it. Two nested loops, no recursion.

// LC 78 — Subsets via bitmask enumeration
func subsets(nums []int) [][]int {
    n := len(nums)
    out := make([][]int, 0, 1<<n)
    for mask := 0; mask < (1 << n); mask++ {
        sub := []int{}
        for i := 0; i < n; i++ {
            if mask&(1<<i) != 0 {
                sub = append(sub, nums[i])
            }
        }
        out = append(out, sub)
    }
    return out
}
// Time:  O(n · 2^n) — 2^n masks, n work to materialise each subset
// Space: O(n · 2^n) for the output

// Alternative — iterate bits with Brian Kernighan (faster on sparse subsets)
func subsetsFast(nums []int) [][]int {
    n := len(nums)
    out := make([][]int, 0, 1<<n)
    for mask := 0; mask < (1 << n); mask++ {
        sub := []int{}
        m := mask
        for m != 0 {
            i := bits.TrailingZeros(uint(m)) // index of lowest set bit
            sub = append(sub, nums[i])
            m &= m - 1                        // clear it
        }
        out = append(out, sub)
    }
    return out
}

Submask iteration: a related trick

Sometimes you need every subset of a given mask rather than every subset of {0..n-1}. The idiom: for sub := mask; sub > 0; sub = (sub - 1) & mask. It walks every submask in decreasing order, no duplicates, no out-of- mask bits. Total work across all masks of a size-n universe is O(3^n) — the textbook "sum over subsets" complexity.

// Iterate every submask of mask, including mask itself but excluding 0
for sub := mask; sub > 0; sub = (sub - 1) & mask {
    // process sub
}
// Include 0: change the loop to "for { ... if sub == 0 { break } ... }"

// Use case — SOS DP (sum over subsets):
//   For each mask, sum f[sub] over every sub ⊆ mask.
//   Naive: O(4^n). With this iteration: O(3^n). With layered SOS: O(n · 2^n).

Why this beats backtracking in interview practice. The bitmask version is six lines. There's no recursion stack to explain, no path-state to push and pop. When a Go interviewer asks "can you also do it without recursion?", you can write this from memory and be done in 90 seconds. The recursive backtracking version is fine too, but this one always gets the "elegant" comment.

11 · Worked example: LC 691 Stickers to Spell Word (HARD)

Problem. Given an array of sticker strings and a target word, return the minimum number of stickers (with repetition allowed) whose letters can be rearranged to spell the target. Return -1 if impossible. Target length is up to 15.

Input:  stickers = ["with","example","science"], target = "thehat"
Output: 3
Explanation: "with" + "example" + "example" gives w,i,t,h,e,x,a,m,p,l,e,e,x,a,m,p,l,e
             — enough to spell "thehat".

Why this is a bitmask problem

Target length ≤ 15. That single constraint is the giveaway. 2^15 = 32768 — a state space small enough to enumerate. Encode "which target letters still need to be covered" as a 15-bit mask.
State. dp[mask] = minimum stickers needed when mask is the set of target letter positions still uncovered. Goal: dp[(1<<n)-1]; answer at dp[0].
Transition. For each sticker, compute the subset of uncovered letters this sticker can fill (greedily, position by position). OR-NOT it out of the current mask. The new mask gets one sticker added to the cost.
BFS over masks. Treat masks as nodes; an edge exists from mask to nextMask if some sticker reduces mask to nextMask. The minimum sticker count is the shortest path from full mask to 0.

Go scaffold: BFS over bitmask states

func minStickers(stickers []string, target string) int {
    n := len(target)
    full := (1 << n) - 1

    // BFS: each layer is one additional sticker
    dp := make([]int, 1<<n)
    for i := range dp {
        dp[i] = -1
    }
    dp[0] = 0

    for mask := 0; mask <= full; mask++ {
        if dp[mask] == -1 {
            continue
        }
        // Try every sticker. For each, compute new mask after greedy fill.
        for _, sticker := range stickers {
            newMask := mask
            // Count letters available in sticker
            avail := [26]int{}
            for _, c := range sticker {
                avail[c-'a']++
            }
            // For each uncovered target position, cover if letter is available
            for i := 0; i < n; i++ {
                if newMask&(1<<i) == 0 && avail[target[i]-'a'] > 0 {
                    newMask |= 1 << i
                    avail[target[i]-'a']--
                }
            }
            // Update if we made progress
            if newMask != mask && (dp[newMask] == -1 || dp[newMask] > dp[mask]+1) {
                dp[newMask] = dp[mask] + 1
            }
        }
    }
    return dp[full]
}
// Time:  O(2^n · |stickers| · L) where L is max sticker length
// Space: O(2^n)

Why this is the canonical "bitmask DP + BFS" problem. Three ideas have to compose cleanly: (1) recognise that target length ≤ 15 is the bitmask signal, (2) state the DP (which target positions are still uncovered) as a bitmask, (3) realise the transition is a greedy fill per sticker, and the overall structure is shortest-path BFS on the mask graph. Few problems test all three at once. Memoised top-down DFS also works and is often cleaner; choose whichever you'll write correctly under pressure.

12 · How to solve hard bit-manipulation problems, step-by-step

Hard bit problems are rarely about inventing a new trick. They're about recognising which of the four signals the problem is sending, then applying the matching template. Four diagnostic questions, in order:

Is there a small n (≤ 20) suggesting bitmask DP? The number 20 (or 15, or 22) in the constraints is the loudest signal in this pattern. 2^n states fit in memory; transitions iterate over which bit to flip. If you see "n ≤ 20" and "optimal ordering / cover all / visit each once", reach for bitmask DP before anything else. LC 847, LC 691, LC 1125 all live here.
Is there an XOR-cancellation property? The prompt mentions "every element appears twice except", "exactly one is missing from 0..n", "find the one that's been swapped". XOR everything and let the structure cancel — usually one line of code, O(1) space. If the structure isn't pair-wise, sometimes a higher-order trick works: LC 137 (every element appears three times except one) uses two bitmask accumulators that track "seen once" and "seen twice" mod 3.
Can multi-state be packed into bits? The problem has several small enumerated dimensions (direction × has-key × visited-set). Encoding them as bit ranges in a single integer turns a multi-key memo into an array lookup — 5–10× faster in cache and 0 allocations. The encoding is key = a | (b << bitsA) | (c << (bitsA + bitsB)); the decoding is shift + mask.
Is there popcount monotonicity? Some DPs proceed by popcount layer: process all masks with popcount k before any with popcount k+1. This is natural when transitions only add bits (subset enumeration, "fill in one more"). The order is: sort masks by popcount, fill in that order; this is what enables the SOS DP family in O(n · 2^n) rather than O(3^n).

The diagnostic-first habit. Bit problems are short but rarely flow naturally from "let me just code it". Spending 30 seconds answering these four questions in your head before writing a line is what separates the candidate who reaches the elegant solution from the one who writes a hash-set baseline and never sees the bitmask version. Interviewers at infra teams listen for the diagnosis out loud.

13 · Common pitfalls

Signed vs unsigned integer width. Python's ints are unbounded; C, Java, and Go fix integers at 32 or 64 bits. A popcount loop that works in Python can wedge in Java if the input is negative — Java's int is signed, and n >> 1 sign-extends. Use n >>> 1 for logical shift, or mask off the high bit.
Operator precedence. &, |, and ^ have lower precedence than == and != in the C family. x & 1 == 0 parses as x & (1 == 0). Always parenthesise: (x & 1) == 0.
Forgetting that n & (n-1) clears the lowest set bit. Easy to misremember as "clears the lowest bit" (i.e. bit 0). It clears the lowest set bit, wherever that happens to be, which is what makes Kernighan's loop work.
Arithmetic vs logical shift in signed languages. Right-shifting a negative number with >> in Java or C preserves the sign bit (arithmetic shift). For unsigned semantics use >>> in Java or cast to unsigned in C. Python and most dynamic languages dodge this entirely.
Subset enumeration on n > 30. 2^30 is roughly a billion masks — too many. Bitmask DP only works when n is small (usually ≤ 20–22 for tight time limits, ≤ 25 for loose ones).
Confusing XOR with addition. XOR is addition mod 2 without carry. 1 + 1 = 2; 1 ^ 1 = 0. Useful when you want pairs to cancel; wrong when you want a sum.
Signed-vs-unsigned shift in Go vs Java. Go has no >>> operator; signedness is a property of the type, not the operation. int32(-1) >> 1 in Go is -1 (arithmetic shift, sign-extends); uint32(0xFFFFFFFF) >> 1 is 0x7FFFFFFF (logical shift, fills with 0). In Java, >> is arithmetic and >>> is logical regardless of operand type. If you're porting Java bit code to Go, convert to uint32/uint64 before the shift, then back if needed.
Off-by-one in bit-position indexing. Bit 0 is the lowest bit (worth 1, not 2). A 32-bit int has bits 0 through 31, not 1 through 32. 1 << 32 on a Go int32 is undefined (compile-time error in Go 1.17+ when the shift amount is a constant), and on dynamic shifts produces 0 or wraps. The most-significant bit of a uint32 is 1 << 31, not 1 << 32.
Overflow when shifting by ≥ word size. In Go, uint32(1) << 32 is 0 at runtime (the shift amount is masked to 5 bits for uint32). In C/C++, shifting by ≥ width is undefined behaviour — your compiler is free to produce 0, the original value, or set your house on fire. Always check the shift amount: if k >= 32 { return 0 }.
The int32(1) << 31 overflow gotcha. In Go, int32(1) << 31 is -2147483648 (the sign bit is set, producing INT_MIN). If your code path uses this as a "max bit" sentinel and later adds 1 to it, you wrap to -2147483647 rather than overflow-trap. Use uint32(1) << 31 when you actually want the value 2147483648. The same trap exists at int64(1) << 63.
Forgetting to handle n = 0 in Brian Kernighan loops. n & (n-1) when n == 0 is 0 & (-1) = 0 in two's-complement, which is fine, but the loop condition for n != 0 already excludes this. The bug is when you write for n > 0 on a signed type and pass a negative input. The loop never runs because n is already ≤ 0.

14 · What to say in the interview

"XOR / bitmask → bit manipulation." Name the recognition signal. "Each element appears twice except one — that's XOR cancellation." Or: "n is small and we need all subsets — that's a bitmask."
State the specific trick. "XOR is its own inverse, so pairs cancel." "Brian Kernighan's loop clears the lowest set bit in one step." "Bit i of the mask means element i is included." The trick is short; saying it shows you know why the code works.
Call out the space win. The bit-manipulation solution to Single Number is O(1) space versus O(n) for the hash-set approach. Same for Missing Number. Interviewers usually want to hear you recognise this.
Talk about width. "In Java I'd use >>> for logical shift". A small thing, but it signals you've written bit code in a real language and not just Python.
Don't overreach. If the obvious solution is fine and bit manipulation is a marginal win, say so. "Hash set is O(n) space but reads more clearly; XOR is O(1) space if we need it." Trade-offs over verdicts.

15 · Where this gets asked

Company	Common framing	Why it fits their stack
Google	Single Number (LC 136, 137, 260), Sum of Two Integers without + (LC 371), Counting Bits (LC 338)	The XOR trick + bit-DP show up at every level. Google L4 phone screens favour Single Number variants.
Meta	Subsets (LC 78 via bitmask), Number of Steps to Reduce a Number to Zero (LC 1342)	Subset enumeration via bitmask is the "show you can think in bits" interview test.
Amazon	Number of 1 Bits (LC 191), Reverse Bits (LC 190), Bitwise AND of Numbers Range (LC 201)	The bit-fundamentals show up in AWS storage layout questions + binary-protocol parsing.
Microsoft	Power of Two (LC 231), Hamming Distance (LC 461), Total Hamming Distance (LC 477)	The classic `n & (n-1)` tricks. Used in spell-check distance metrics + fuzzy match.
Apple	Maximum XOR of Two Numbers (LC 421 via bit trie), Number Complement (LC 476)	Bit-trie XOR shows up routinely. Apple's compression + cryptography work uses these primitives.
Bloomberg / Citadel	Maximum XOR Subarray (variant of LC 421), Subsets via bitmask DP, Travelling Salesman bitmask DP	Bitmask DP for ≤ 20 nodes shows up in HFT route-optimisation interviews routinely.

Pattern of patterns. Five bit sub-shapes: (1) XOR cancels duplicates (LC 136, 268), (2) n & (n-1) clears the lowest set bit (LC 191, 231), (3) bitmask as set membership for n ≤ 20 (LC 78, TSP), (4) bit trie for XOR maximisation (LC 421), (5) bit-DP where state is "subset of items used so far" (TSP, Smallest Sufficient Team). Knowing all five covers ~95% of bit-manipulation problems.

16 · Try it yourself

Single Number · LC 136: The XOR cancellation drill. Hint: a ^ a = 0 and a ^ 0 = a. XOR every number; the duplicates cancel; the unique one remains. O(n) time, O(1) space: the clean solution.
Number of 1 Bits · LC 191: Two solutions worth knowing. Hint: naive, shift right 32 times, count 1s. Better, n &= n - 1 clears the lowest set bit each iteration; count how many iterations until n = 0. Best, Brian Kernighan's algorithm.
Counting Bits · LC 338: The bit-DP shortcut. Hint: bits[i] = bits[i >> 1] + (i & 1). Each number's bit count equals the count of its right-shift plus the lowest bit. O(n) total, not O(n log n).
Sum of Two Integers Without + · LC 371: Half-adder logic in software. Hint: a ^ b = sum without carry; (a & b) << 1 = carry. Loop while carry ≠ 0. Mind language-specific signed integer behaviour; Python needs a 32-bit mask to handle negatives.
Stretch: Maximum XOR of Two Numbers · LC 421: Bit trie. Hint: insert each number bit-by-bit (MSB first) into a 32-level trie. For each number, walk the trie greedily picking the OPPOSITE bit at each level when available. That bit XORs to 1, maximising the result.

How to practise. Print binary representations when you debug bit code. f"{n:032b}" in Python shows you what's actually happening. Most bit bugs come from misreading the bit order or forgetting that integers are 32 (or 64) bits in your language. The print is cheap; the confusion is expensive.

Bit manipulation

1 · Why bit manipulation matters

Production examples: where bit tricks pay rent

2 · Intuition: every integer is a small array of booleans

3 · How to recognise it

Shape A: state as a bitmask (subset enumeration)

Shape B: find the missing or duplicate via XOR

Shape C: counting set bits (popcount)

Shape D: lowest set bit isolation (x & -x)

Shape E: parity / Gray code

Shape F: constant-time arithmetic tricks

Shape G: bitmask DP for subset enumeration (n ≤ 20)

Shape H: state compression (pack multiple flags into one int)

Shape I: XOR for "find the unique" / "find the missing"

Shape J: popcount-based scoring / parity reasoning

4 · Sister algorithms

5 · Canonical example: Single Number (LC 136)

6 · Reference implementation

7 · Variations

7a · The bit tricks cheat sheet

7b · The bit operations cheat table

8 · Five-problem progression

9 · Worked example: LC 191 Number of 1 Bits (Brian Kernighan)

The trick: why n & (n-1) clears one bit at a time

Trace: n = 13 = 0b1101

10 · Worked example: LC 78 Subsets (bitmask enumeration)

The trick: every mask is a subset

Submask iteration: a related trick

11 · Worked example: LC 691 Stickers to Spell Word (HARD)

Why this is a bitmask problem

Go scaffold: BFS over bitmask states

12 · How to solve hard bit-manipulation problems, step-by-step

13 · Common pitfalls

14 · What to say in the interview

15 · Where this gets asked

16 · Try it yourself

Further reading

Linked list: pointer manipulation