Storage & Databases

Bloom filter

Probabilistic set: "definitely not" or "probably yes" in tiny space.


In plain terms

Burton Bloom, 1970. Cassandra row caches, every LSM-tree, Bitcoin SPV, Chrome safe-browsing. ~10 bits per element for 1% false-positive.

Origin

Burton Howard Bloom, MIT 1970. The original paper described it as a way to do spell-checking in 8 KB of RAM — a problem that sounds quaint until you remember the same trick now sits in front of every LSM-tree to decide whether to read from disk.

Where it shows up in production
  • Cassandra & RocksDB A per-SSTable Bloom filter answers "could this key be here?" without touching disk.
  • Chrome Safe Browsing A Bloom filter of malware URLs ships in the browser; full check only on a "maybe."
  • Bitcoin SPV Light wallets send a Bloom filter to full nodes so they only see relevant transactions.
On Semicolony
Sources & further reading
Found this useful?