Stonebraker et al · 2005
Paper · Storage · Analytics

C-Store, columns beat rows for analytics.

Row-oriented databases store every column of a record together — perfect for OLTP workloads where you read whole rows. Analytical queries, the paper argues, do the opposite: they scan one or two columns across millions of rows. C-Store flips the storage layout: store each column separately, scan only what the query touches, compress aggressively. The paper proves the architecture beats commercial row stores by 10× on TPC-H-class queries and kicks off the modern OLAP era — Vertica, ClickHouse, BigQuery, Snowflake all descend from this design.

Authors Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, et al
Year 2005
Venue VLDB

TL;DR

C-Store stores each column as a separate stream rather than packing rows together. Queries that touch only a few columns read only those columns from disk — much less I/O than a row store reads. Compression works well on columns because values within a column tend to be similar (dates near each other, status codes from a small set). C-Store introduces projections (materialised column groupings), shared-everything storage, and a write store / read store split for OLTP+OLAP. The paper benchmarks the prototype against a commercial row-store on TPC-H and shows ~10× speedup on the analytical queries — and the result was so striking that within a decade every major analytical database had reorganised around column-oriented storage.

The problem

In 2005, OLAP queries were running on row-oriented databases. A typical analytical query — "SUM(revenue) WHERE date BETWEEN X AND Y" — touches two columns out of perhaps fifty. A row store reads all fifty columns (the storage unit is the row) and discards the unneeded forty-eight. The wasted I/O dominated query latency.

Earlier projects (Sybase IQ in the 1990s) had experimented with columnar storage but mostly as a vertical-partitioning trick rather than a fundamental redesign. The Stonebraker team proposed: what if everything — the storage layer, the compression, the query optimiser, the transaction layer — were designed around columns from the start?

The key idea

Each column is its own file. Storage is a stream of values for one attribute. Reading a column doesn't pay any I/O cost for other columns. Compression is per-column; since values in a column share a domain (dates, status codes, names), compression ratios are much better than row-level.

Compression-aware execution. The query engine operates on compressed data wherever possible. Run-length-encoded columns can be aggregated without decompressing every value. Bit-vector or dictionary-encoded columns allow operations to be expressed as bit operations on the encoded representation. The paper argues this is a >5× CPU win on common analytical operations.

Projections, not tables. The user declares logical tables, but C-Store stores them as projections — column groupings sorted by chosen keys. Multiple overlapping projections of the same table can coexist (one sorted by date, one by customer-id). The query optimiser picks the best-suited projection for each query. This is the "data redundancy for read performance" trade.

WS / RS split. A small in-memory Write Store absorbs inserts and updates; a much larger Read Store on disk handles analytical queries. Background merge moves data from WS to RS in batches. The split lets OLTP-style writes coexist with OLAP-style scans without locking.

Snapshot isolation by default. Reads see a consistent snapshot of the database as of the last merge boundary. Writes go to WS; reads merge WS and RS. This avoids the lock conflicts that would otherwise occur when long analytical scans and short OLTP writes overlap.

Compression is the win. On TPC-H-like workloads, C-Store achieved 4–8× better compression than a row store, plus far less data read per query, plus CPU operations on compressed values. The combined effect is the 10× speedup the paper benchmarks. Column compression works because: dates in a date column are close to each other (delta-encoding works), enum/status values are from a small domain (dictionary-encoding works), numeric measures are often skewed (run-length-encoding works). Row-level compression works on none of these because adjacent bytes are from different domains.

Contributions

Column-oriented storage proven at production scale. The paper's benchmarks weren't toy; they ran TPC-H scale-factor 100 against a commercial row store. The 10× claim was specific, reproducible, and shocked the database community.

Compression as a first-class concern. Designing storage around compressibility, designing the engine to operate on compressed data, treating compression as a CPU optimization rather than a storage optimization. The lineage from C-Store to ClickHouse to BigQuery to DuckDB is direct.

Projections. The idea that the user's logical table can be physically realised as several overlapping column groupings, each with its own sort order, picked per query. Vertica productised this.

WS/RS split. A way to do mixed OLTP+OLAP on the same data without lock contention. The pattern recurs in HTAP databases (TiDB, SingleStore, MariaDB ColumnStore).

Snapshot-based isolation. Reads on RS see a consistent point-in-time; updates go to WS. Avoids the traditional read-write lock contention pattern of row stores.

Criticisms and limitations

Writes are slow. Inserts go to WS, get merged into RS in batches. For workloads with many small inserts per second, the merge becomes a bottleneck. Pure column stores still struggle with high-rate OLTP.

Multi-row updates are awkward. A single update has to touch every column file. Coupled with the snapshot semantics, this makes single-row updates 5–10× slower than a row store would handle them.

Projections are space-hungry. Multiple sorted copies of the same data multiply storage. Vertica's production tuning is famously about choosing the right projections; pick wrong and you either waste space or query slowly.

The paper is an academic prototype. Many production details (concurrency, recovery, durability) are sketched rather than worked out. Vertica filled in those details over the next decade; the paper itself is more of a research statement than a system manual.

Where it shows up today

Vertica — the commercial descendant, founded by Stonebraker and team. Acquired by HP, now an enterprise analytical database.

ClickHouse — open-source column store, originally built at Yandex. Now everywhere from telemetry pipelines to product analytics.

Apache Parquet, Apache ORC — file formats that take C-Store's column-oriented + compression-aware design and bring it to data-lake / Spark workloads.

Google BigQuery, Snowflake, AWS Redshift, Azure Synapse — all use column-oriented storage with C-Store-style compression and CPU-conscious execution.

DuckDB — embedded column-store, designed for single-node analytics.

Apache Arrow — the in-memory columnar format used by Pandas, DuckDB, BigQuery client libraries, and most modern data tooling. Direct descendant of C-Store ideas about CPU operations on columns.

Follow-up reading

More annotated papers
Back to the papers index
Foundational distributed-systems and database papers, read and annotated.
Found this useful?