Storage & Databases

Erasure coding

Store k data shards + m parity shards; tolerate any m losses.


In plain terms

Reed-Solomon for cold storage. ~1.5× space overhead vs 3× for replication, at the cost of more compute on read with failures.

Origin

Reed-Solomon codes (Reed & Solomon, 1960) are the workhorse. The application to disk storage was Patterson, Gibson, and Katz's RAID paper (1988); large-scale "exotic" codes like LRC followed in the 2010s.

Where it shows up in production
  • Amazon S3 / Google Colossus Reed-Solomon-style codes (6+3, 10+4) instead of 3× replication. Half the storage cost.
  • Microsoft Azure LRC Local Reconstruction Codes — faster repair on single-disk loss than classical RS.
  • Backblaze B2 Public hardware writeups showing 17+3 Reed-Solomon across "Storage Pods."
Sources & further reading
Found this useful?