HLL
Cardinality estimator in 12 KB with ~1% error.
Origin
Flajolet, Fusy, Gandouet, and Meunier, "HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm," AofA 2007. Successor to LogLog (2003) and PCSA (1985).
Where it shows up in production
- Redis PFCOUNT Built-in HLL — count unique items in 12 KB of memory regardless of cardinality.
- BigQuery / Snowflake APPROX_COUNT_DISTINCT HLL under the hood for cardinality on petabyte-scale tables.
- Google Analytics unique users HLLs are why "unique users" loads in 100ms over billions of events.
On Semicolony
Sources & further reading
Found this useful?