Storage & Databases

Compaction

Merging LSM SSTables to bound read amplification.


In plain terms

Tiered (size-tiered): merge files of similar size. Levelled (RocksDB default): each level is K× the size of the previous, single sorted run per level.

Origin

Comes from the LSM-tree paper (O'Neil et al., 1996). RocksDB documents the modern tiered vs levelled trade-offs more clearly than any other reference.

Where it shows up in production
  • RocksDB Levelled compaction default; SSTables in level N are merged into N+1 in the background.
  • Cassandra Three strategies: STCS (size-tiered), LCS (levelled), TWCS (time-window) — pick by workload.
  • ScyllaDB Improves on Cassandra's compaction by parallelising across shards.
On Semicolony
Sources & further reading
Found this useful?