Erasure coding
Store k data shards + m parity shards; tolerate any m losses.
Origin
Reed-Solomon codes (Reed & Solomon, 1960) are the workhorse. The application to disk storage was Patterson, Gibson, and Katz's RAID paper (1988); large-scale "exotic" codes like LRC followed in the 2010s.
Where it shows up in production
- Amazon S3 / Google Colossus Reed-Solomon-style codes (6+3, 10+4) instead of 3× replication. Half the storage cost.
- Microsoft Azure LRC Local Reconstruction Codes — faster repair on single-disk loss than classical RS.
- Backblaze B2 Public hardware writeups showing 17+3 Reed-Solomon across "Storage Pods."
Sources & further reading
Found this useful?