Consistent hashing.
Naive sharding by modulo means adding one node remaps almost every key in the cluster. Consistent hashing keeps that to ~1/N. Watch a 4-node ring add a 5th, lose one, and finally turn on virtual nodes for smooth distribution.
Each node sits at a position on a 0-360° ring (in real systems the ring is 0 to 2^160 or similar). Each key hashes to a position too. To find which node owns a key: walk clockwise from the key's position; first node you hit is the owner. Even with random hashes the distribution is roughly balanced as long as you have enough nodes.
- Consistent hashing
- A hashing scheme where adding/removing a node only relocates a small fraction of keys — roughly 1/N of them. Naive modulo hashing relocates ALL keys when N changes.
- Hash ring
- A circular keyspace where both nodes and keys hash to positions. Ownership = "next node clockwise from this key."
Why naive modulo sharding hurts so much
If you do node = hash(key) % N, every change in N reshuffles almost every key. Going from 4 nodes to 5 doesn\'t move 20% of keys — it moves about 80%, because the formula\'s output changes for most inputs. Every reshuffle = cache misses, network traffic, downtime risk. Consistent hashing keeps the disruption bounded to roughly 1/N of keys. That\'s why every modern distributed cache and database uses some flavor of it.
Variations you\'ll meet
Jump consistent hash (Google, 2014) doesn\'t need a ring or virtual nodes — just an algorithm that, given a key and an N, returns a bucket. Very fast, very compact, but it can\'t handle weighted nodes or arbitrary node IDs. Rendezvous hashing (HRW) picks the node with the highest hash(key, node_id). No ring, no rebalancing math, simple to implement. Maglev hashing (used at Google for L4 load balancing) builds a lookup table that\'s evenly distributed and minimally disrupted on node changes.
Where it shows up
memcached client libraries (libketama), DynamoDB and Cassandra (partition keys land on virtual nodes), Riak, Discord\'s message sharding, CDN cache server selection, ride-share dispatch (which server handles which geographic region). Anywhere you have "N servers, K things to assign, occasionally add/remove servers" — consistent hashing is the default tool.
Partitioning strategies →
Range vs hash partitioning, consistent hashing variants in depth, hot-spot avoidance, repartitioning during scale-out, weighted nodes.
Open the Codex →