Storage & Databases also: HyperLogLog

HLL

Cardinality estimator in 12 KB with ~1% error.


In plain terms

Flajolet et al, INRIA 2007. Mergeable across shards. Powers every "unique users" dashboard at internet scale.

Origin

Flajolet, Fusy, Gandouet, and Meunier, "HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm," AofA 2007. Successor to LogLog (2003) and PCSA (1985).

Where it shows up in production
  • Redis PFCOUNT Built-in HLL — count unique items in 12 KB of memory regardless of cardinality.
  • BigQuery / Snowflake APPROX_COUNT_DISTINCT HLL under the hood for cardinality on petabyte-scale tables.
  • Google Analytics unique users HLLs are why "unique users" loads in 100ms over billions of events.
On Semicolony
Sources & further reading
Found this useful?