F1, SQL on Spanner.

AdWords ran on sharded MySQL. By 2010 Google had ~100 shards, each manually managed. Every schema change required engineering effort across a hundred running databases. F1 was the bet that a distributed SQL database could replace this — and that Spanner could be the storage layer underneath it. The paper documents the migration: how F1 looks like MySQL to applications, but underneath is a globally-replicated, externally-consistent system.

Authors Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, et al

Year 2013

Venue VLDB

PDF →

TL;DR

F1 is the SQL execution engine that sits on top of Spanner's storage. To applications it looks like a MySQL-compatible database with ACID transactions; underneath, every write goes through Spanner's Paxos groups and TrueTime-ordered commits. The paper covers F1's schema model (hierarchically clustered tables, like Bigtable's row keys but with multi-level locality), its handling of online schema changes (with global coordination via a special schema table), the parallel query execution layer, and the performance trade-offs the AdWords migration accepted (higher write latency, but no more manual sharding). Most importantly: the paper proved distributed SQL was possible, kicking off a decade of work on CockroachDB, YugabyteDB, TiDB, and Spanner-as-a-service.

The problem

AdWords is Google's revenue engine. By 2010 it ran on a sharded-MySQL fleet that was painful in three specific ways. First, schema changes required a months-long manual rollout per shard. Second, cross-shard joins were impossible — every report that crossed customers required custom code. Third, operational toil: every shard split, every replica failure, every backup, was a human-driven process at Google's scale.

The team had two options. Scale MySQL further with more sharding tooling (Vitess-style), or replace it with something that handled distribution natively. F1 + Spanner was the latter bet. The paper documents how they made the migration without rewriting the application — F1 spoke MySQL-compatible SQL and looked the same from the application's vantage point.

The key idea

F1 is a stateless SQL engine; Spanner is the durable, globally-consistent storage. F1 servers route queries to the appropriate Spanner shards, plan distributed query execution, and translate SQL operations into Spanner reads and writes. Because Spanner provides external consistency, F1 inherits ACID transactions across any set of rows in the database — no application-level coordination needed.

The schema model is built around interleaved tables. A child table can be declared as INTERLEAVE IN PARENT, which means its rows are physically co-located with the parent row's rows. AdWords's customers, campaigns, ads, and clicks are organised this way: a customer's entire data lives in one Spanner directory, on one Paxos group. Queries that scope to a customer never cross shards. Queries that scan across customers go parallel.

Online schema changes are the most operationally interesting part. F1 uses a coordinated multi-phase rollout: add the new column in DELETE_ONLY state (writes ignore it, reads return defaults), promote to WRITE_ONLY (writes update both old and new, reads ignore), then WRITE_AND_VALIDATE (consistency check), then READ_WRITE. Each phase is rolled out atomically across the entire fleet using a special schema table; F1 servers refuse to serve writes during a schema transition if their local copy of the schema doesn't match the cluster-coordinated version.

The change-history trick. F1 maintains a change history table for every regular table — every write also writes a change record. This sounds expensive but enables incremental view maintenance, the ad-serving feature where partial query results stream to clients as they change. The change-history pattern (CDC, change streams) later showed up in DynamoDB Streams, Spanner change streams, MongoDB change streams, and Postgres logical replication. F1 productionised the pattern at scale.

Contributions

Proof of concept for distributed SQL. Until F1, the consensus view was that scale-out SQL was theoretically possible but practically broken — too slow, too inconsistent, too operationally complex. F1 + Spanner powered the largest revenue stream at Google, including ACID transactions across continents. The paper is the production evidence that the architecture works.

Interleaved tables. The schema declaration that physically co-locates child rows with their parent. This pattern — locality via schema — shows up in CockroachDB (INTERLEAVE clause), YugabyteDB, and TiDB.

Online schema change protocol. The DELETE_ONLY → WRITE_ONLY → WRITE_AND_VALIDATE → READ_WRITE state machine for adding columns and indexes without downtime. CockroachDB and YugabyteDB implement the same protocol.

Change history and streaming reads. Every write produces a change record; subscribers can stream the changes. The ancestor of CDC pipelines, change streams, and event-sourcing architectures over relational databases.

Operational economics. The paper makes the case that the higher per-write latency of distributed SQL (5–10 ms vs MySQL's 1 ms) is a worthwhile trade for the elimination of manual sharding and the addition of cross-shard transactions. The argument has held up.

Criticisms and limitations

F1's write latency is significantly higher than sharded MySQL's — typically 5–10 ms per commit, versus 1 ms for a sharded-MySQL write. For OLTP workloads that need sub-millisecond writes, the architecture is overkill.

The paper assumes you can co-design schema with locality (interleaved tables). For workloads that don't have a natural hierarchical organisation, the locality argument doesn't apply and queries pay full distributed-execution cost.

The dependency on Spanner means F1 inherits Spanner's hardware requirements — GPS receivers, atomic clocks per datacenter, custom networking. The open-source descendants (CockroachDB, YugabyteDB) use HLC (hybrid logical clocks) instead of TrueTime, accepting slightly weaker consistency guarantees in exchange for being deployable on commodity infrastructure.

The paper is fairly short on the query engine's distributed planning algorithms. F1's follow-up paper, "Spanner: Becoming a SQL System" (2017), fills in those details.

Where it shows up today

F1 itself is still running AdWords, with extensions for AdSense and other Google ads products.

CockroachDB — explicitly inspired by F1 and Spanner. Implements interleaved tables, online schema changes via the same state machine, change feeds.

YugabyteDB and TiDB — both descendants in design. Use HLC instead of TrueTime, range-shard with auto-rebalancing.

Spanner SQL (the productised Cloud Spanner) — the same engine, externally available.

Snowflake, Databricks SQL — different workload (OLAP not OLTP), but borrow the architectural pattern: stateless compute, distributed durable storage, ACID over a distributed log.

Follow-up reading

Spanner: Google's Globally Distributed Database — Corbett et al · 2012 · OSDI. The storage layer F1 sits on. Read this one first. Annotated.
Spanner: Becoming a SQL System — Bacon et al · 2017 · SIGMOD. The follow-up that documents Spanner's migration from a KV API to the F1-style SQL layer. Now part of Cloud Spanner.
Online Asynchronous Schema Change in F1 — Rae et al · 2013 · VLDB. The dedicated paper on F1's schema change protocol.
CockroachDB: The Resilient Geo-Distributed SQL Database — Taft et al · 2020 · SIGMOD. The most direct open-source descendant. Annotated production-grade.
TiDB: A Raft-based HTAP Database — Huang et al · 2020 · VLDB. Another descendant, more focused on hybrid transactional/analytical workloads.

More annotated papers

Back to the papers index

Foundational distributed-systems and database papers, read and annotated.

← All papers

Found this useful?