Storage & Databases

Sharding

Splitting data across machines by key.


In plain terms

Range, hash, directory. The harder problems are rebalancing, cross-shard queries, hot keys.

Origin

Term originates with Google Bigtable / Yahoo PNUTS work in the mid-2000s. The 1990s called it "horizontal partitioning"; the principle goes back further.

Where it shows up in production
  • MongoDB Range or hash-based shard keys; mongos router distributes queries.
  • Vitess Sharding layer in front of MySQL; powers YouTube's billions of rows.
  • Discord Sharded across thousands of Cassandra+ScyllaDB clusters by guild_id; documented in engineering blogs.
On Semicolony
Sources & further reading
Found this useful?