Storage & Databases also: Google File System

GFS

The 2003 paper that founded scale-out storage.


In plain terms

Blueprint for HDFS, Colossus, every modern object store. Master + chunk servers; chunks are 64MB.

Origin

Ghemawat, Gobioff, and Leung, "The Google File System," SOSP 2003. Blueprint for HDFS, Colossus, and every modern object store.

Where it shows up in production
  • HDFS Open-source clone of GFS; the storage layer for Hadoop and Spark for over a decade.
  • Colossus Google's GFS successor (since ~2010). Still runs everything at Google.
  • Amazon S3 Different design but the same shape — metadata + chunk separation, replicated across racks.
On Semicolony
Sources & further reading
Found this useful?