Storage & Databases

Cardinality

Number of distinct values in a set.


In plain terms

High-cardinality columns make poor partition keys (uneven distribution); low-cardinality columns make poor indexes (each key matches many rows).

Origin

Database term as old as the field. The query-planner concern (high vs low cardinality columns) shows up in System R's 1979 cost model.

Where it shows up in production
  • Postgres ANALYZE Builds per-column statistics including approximate distinct counts; drives the planner.
  • Time-series databases High-cardinality label sets (per-user, per-request-id) are a known performance killer in Prometheus.
On Semicolony
Sources & further reading
Found this useful?