Lab Notebook · Vol. XIV · 2026 Reproducible numbers, cited 32 experiments · 4 sections

Volume XIV — opening pages

Numbers, measured.

Production decisions need orders of magnitude, not vibes. This notebook collects published, citable measurements for the systems we describe elsewhere — memory hierarchy, B-tree depth, Bloom filter accuracy, compression speed, queueing latency, fan-out tails, write amplification.

I

Storage & data structures.

How fast is each layer, and where the structure pays off.

EXP 01

The memory hierarchy

Each level is roughly an order of magnitude slower.

Question

How long does it take to read one byte from each level of the memory hierarchy on a modern server?

Setup

Numbers are typical for an x86-64 cloud VM circa 2024 (Skylake-class CPU, 3 GHz, NVMe SSD, 25 Gbit network). Latency varies with hardware; ratios are stable.

Readings (ns)
L1 cache
1
~0.5 ns; predictable
L2 cache
4
~10 cycles
L3 cache
12
Shared across cores
Main memory (RAM)
100
DRAM access
NVMe SSD (random read)
100,000
~100 µs
Same-DC network (10 GbE)
500,000
~500 µs round trip
Cross-region (US-EU)
100,000,000
~100 ms; speed of light
HDD (random seek)
10,000,000
~10 ms; rotational
log scale
Takeaway

Cache locality dominates algorithmic complexity for small n. A well-laid-out array beats a smart tree of pointers up to 10,000 elements because the array never misses cache; the tree pays L2 / L3 / RAM penalties on every pointer chase.

source
EXP 02

B-tree depth vs row count

Why your index never gets deep.

Question

How deep does a B+tree get for a billion-row index?

Setup

Each internal node holds 200 keys (typical for 4 KiB pages with 8-byte keys + 8-byte child pointers). Each leaf holds rows. Depth = ⌈log_b(n)⌉ + 1.

Readings (levels)
1 thousand rows
2
2 page reads
100 thousand rows
3
3 page reads
10 million rows
4
4 page reads
1 billion rows
5
5 page reads
100 billion rows
6
6 page reads
10 trillion rows
7
7 page reads
Takeaway

B-trees stay shallow because of the wide fan-out. A binary tree at 10 million rows is 24 levels deep; the B+tree is 4. That's the difference between 24 disk seeks and 4 — the order-of-magnitude that makes databases feasible.

source
EXP 03

Bloom filter false-positive rate

10 bits per element ≈ 1% false positive.

Question

Given m bits per element and k hash functions, what false-positive rate do you get?

Setup

Standard Bloom filter formula: FPR ≈ (1 − e^(−kn/m))^k, with optimal k = (m/n) ln 2. Numbers below assume optimal k.

Readings (% false positive)
4 bits/elt
14.6
optimal k=3
8 bits/elt
2.16
optimal k=6
10 bits/elt
0.96
optimal k=7 — the popular default
16 bits/elt
0.046
optimal k=11
24 bits/elt
0.001
optimal k=17
32 bits/elt
0
optimal k=22
log scale
Takeaway

Doubling the bits-per-element roughly squares the FPR. Cassandra and RocksDB use 10 bits/elt (~1% FPR) by default. Higher precision is rarely worth it: the cost of the rare false positive is one extra disk read.

source
EXP 03b

Hash table — collisions vs load factor

Past 0.85, every insert is a fight.

Question

How many probes does an open-addressing hash table take per lookup as load factor α rises?

Setup

Linear probing with random hashing. Expected probes per successful lookup ≈ ½(1 + 1/(1−α)). Numbers below are mean probes for a successful lookup.

Readings (probes)
α = 0.50
1.5
comfortable
α = 0.75
2.5
Java HashMap default
α = 0.85
3.8
starting to bite
α = 0.90
5.5
visible slowdown
α = 0.95
10.5
do not run here
α = 0.99
50.5
cliff
Takeaway

Resize before α crosses 0.75. Open-addressing tables go unstable in their tail, not their mean — at α = 0.9 the p99 of a probe sequence is well into the dozens. Robin-hood hashing softens the tail; it does not move the wall.

source
EXP 03c

LSM read amplification by levels

Bloom filters are why LSMs are usable.

Question

How many SSTables does a point lookup touch in an LSM tree, with and without bloom filters?

Setup

Worst case: N levels means N SSTable lookups. With bloom filters at FPR p, expected disk reads ≈ p × (N − 1) + 1.

Readings (avg disk reads)
4 levels · no bloom
4
always touches all
4 levels · bloom 1%
1.03
almost always one read
7 levels · no bloom
7
classic RocksDB shape
7 levels · bloom 1%
1.06
still ~one read
10 levels · bloom 1%
1.09
why bloom matters
10 levels · bloom 0.1%
1.009
diminishing returns
Takeaway

Bloom filters are not optional for LSM point reads. They convert what would be 7-10× read amplification into ~1 disk read on the average path, at the cost of ~12 bits/key in RAM. RocksDB enables them by default for exactly this reason.

source
II

Compression.

Speed-vs-size, with real bytes.

EXP 04

gzip vs brotli vs zstd

Compression ratio vs encode/decode speed.

Question

How much does each modern compressor reduce a 1 MB JSON document, and at what speed?

Setup

Squash benchmark v0.7, x86-64, default level for each. JSON corpus from the Squash dataset. Numbers are MB/s for compress and decompress.

Readings (mixed)
gzip · ratio
5.4
×
gzip · compress
95
MB/s
gzip · decompress
360
MB/s
brotli · ratio
6.5
× — best for static assets
brotli · compress
0.9
MB/s — slow at default
brotli · decompress
380
MB/s
zstd · ratio
5.7
× — at default
zstd · compress
530
MB/s — fastest mainstream encode
zstd · decompress
1,500
MB/s — fastest decode
Takeaway

For real-time encode (logs, RPC bodies), zstd is the clear winner — same ratio as gzip, 5x encode speed, 4x decode. Brotli only wins for static web assets where you encode once and decode billions of times.

source
EXP 05

JSON vs Protobuf wire size

A 25–40% reduction, give or take.

Question

How much smaller is a Protobuf message vs the equivalent JSON for typical record shapes?

Setup

Compared on five common shapes: a payments record, an analytics event, a user profile, a search hit, a config blob. Sizes are uncompressed.

Readings (% of JSON)
Payments record
38
% of JSON size
Analytics event
32
%
User profile
41
%
Search hit
28
%
Config blob
45
%
Takeaway

25–45% reduction is typical. Small numeric values benefit most (varint encoding); deeply nested optional fields benefit less. After gzip, the gap narrows considerably — gzipped JSON is often within 10% of gzipped Protobuf. Protobuf still wins on parse cost and schema discipline.

source
EXP 05b

Columnar compression on time-series

A row-store loses to Parquet by 10×.

Question

How much smaller is the same data in a columnar format with appropriate encodings?

Setup

Same 1 GB time-series table (timestamp, sensor_id, value, tag). Stored in CSV, Parquet+Snappy (default), Parquet+ZSTD, ORC. ClickBench-style measurements.

Readings (MB on disk)
CSV · uncompressed
1,024
MB · baseline
CSV · gzip
280
MB · 3.7× smaller
Parquet · Snappy
110
MB · 9.3× smaller
Parquet · ZSTD
78
MB · 13× smaller
ORC · ZSTD
71
MB · best columnar
Takeaway

Columnar formats win by storing each column contiguously and applying type-aware encodings (RLE, delta, dictionary) before the entropy coder runs. The 10× edge over compressed CSV is why every analytics warehouse — BigQuery, Snowflake, ClickHouse, Redshift — is column-oriented.

source
EXP 05c

JSON encoding overhead per row

Field names cost more than values.

Question

How much of a JSON payload is structural overhead (keys, braces, quotes, commas) for typical small rows?

Setup

Five representative rows of varying field count and value type. "Value bytes" is the sum of value lengths; "structural" is everything else.

Readings (% structural)
3 fields · short str
64
% structural
6 fields · numeric
71
%
10 fields · short str
58
%
20 fields · mixed
49
%
50 fields · nested
41
%
Takeaway

A "JSON event" is 50-70% braces and key names. This is why MessagePack, CBOR, Avro and Protobuf can shrink small payloads dramatically — the field tag is a single byte, not a quoted string. Compression flattens this advantage; uncompressed wires do not.

source
III

Concurrency & tail latency.

When parallelism stops paying.

EXP 06

M/M/1 queue: latency at high utilisation

You cannot run a server at 99% utilisation.

Question

How does mean wait time grow as utilisation ρ approaches 1?

Setup

Standard M/M/1 result: mean wait W = (1/μ) × ρ/(1−ρ). Below: ρ vs the multiplier on baseline service time.

Readings (× service time)
ρ = 0.50
1
1× service time wait
ρ = 0.70
2.3
2.3×
ρ = 0.80
4
ρ = 0.90
9
ρ = 0.95
19
19×
ρ = 0.99
99
99× — and growing
ρ = 0.999
999
999×
Takeaway

Aim for 60–70% utilisation for latency-sensitive services. The (1−ρ) denominator makes the curve hyperbolic past 0.9. A "well-utilised" 95% server is in the latency death zone.

source
EXP 07

Tail latency for fan-out queries

p99 of one becomes p50 of 100.

Question

If you fan out a query to N parallel workers and wait for all, how does the system's p99 latency change?

Setup

Each worker has independent latency with a heavy-tailed distribution (Weibull-like, p99 ≈ 5× p50). System latency is the max of N samples.

Readings (× single-worker p50)
N = 1
5
p99 ≈ 5× p50
N = 10
9
9× — p99 of one is now p50 of 10
N = 100
13
13×
N = 1,000
17
17×
N = 10,000
22
22×
Takeaway

In fan-out, the tail dominates. Mitigations from Dean & Barroso 2013: hedged requests (fire two, take the first), bounded latency (timeout and partial result), tied requests (cancel the slow one when the first returns). Without them, p99 of the system explodes.

source
EXP 07b

Amdahl's law

5% serial means a hard ceiling at 20×.

Question

If a fraction s of work is inherently serial, how much speedup can N parallel cores give you?

Setup

Amdahl 1967: speedup = 1 / (s + (1−s)/N). Below: speedup at N = 16, 64, 256, ∞ for typical serial fractions.

Readings (× speedup)
s = 1% · N=16
13.9
~87% efficiency
s = 1% · N=∞
100
hard ceiling
s = 5% · N=16
9.1
~57%
s = 5% · N=∞
20
ceiling at 20×
s = 10% · N=64
8.8
~14%
s = 10% · N=∞
10
no point past 32 cores
s = 25% · N=∞
4
parallel barely helps
Takeaway

Adding cores past Amdahl's ceiling only burns money. The bigger the system gets, the more important the serial 1-5% becomes — locks, cross-shard coordination, leader-only operations. The optimisation that pays at scale is removing the serial fraction, not adding cores.

source
EXP 07c

False sharing — cache line contention

Two cores writing 4 bytes apart, 100× slowdown.

Question

How much does false sharing cost when two cores update independent variables that happen to share a 64-byte cache line?

Setup

Microbenchmark: each core does 10⁸ atomic increments on its own counter. "Same cache line" places counters within 64 bytes; "padded" pads each counter to its own line.

Readings (normalised throughput)
1 core
1
baseline · ~10⁸ ops/s
2 cores · padded
1
no sharing — perfect scale
2 cores · same line
0.04
25× slowdown · ping-pong
4 cores · same line
0.012
~80× slowdown
8 cores · same line
0.005
~200× slowdown
Takeaway

Pad hot per-thread state to a full cache line (64 B on x86, 128 B on Apple silicon). Java has @Contended, C++ has alignas(64) std::hardware_destructive_interference_size. The cost of getting it wrong is invisible on a single core and catastrophic at scale.

source
IV

Storage amplification.

What you write isn't what hits disk.

EXP 08

LSM-tree write amplification

Flushed once, compacted many times.

Question

How many bytes hit disk per byte logically written, by compaction strategy?

Setup

RocksDB with default level-style compaction, ~10 levels, 10× growth factor. Numbers from Facebook's 2016 RocksDB paper and follow-ups.

Readings (× write amp)
Level (default)
22
each byte rewritten ~22× across compactions
Universal (size-tiered)
8
fewer rewrites, more space amp
FIFO
1
no compaction, oldest segment dropped
B-tree (Postgres)
2
one page write + one WAL write
Takeaway

Choosing between B-tree and LSM is choosing what to amplify. B-tree: low write amp, higher random I/O. LSM: high write amp, sequential I/O — perfect for SSDs. Universal compaction (used by ScyllaDB) trades space for write amp.

source
EXP 08b

Replication factor — bytes vs durability

Eleven nines costs less than you think.

Question

How does storage cost change as you add replicas or switch to erasure coding?

Setup

Per-byte storage overhead for each scheme. Durability target: probability the data is lost in a year given typical disk AFR ~1%.

Readings (% of raw bytes)
Single copy
100
% · ~1% loss/yr · don't
2× replication
200
% · ~10⁻⁴ loss/yr
3× replication
300
% · ~10⁻⁶ loss/yr · classic HDFS
EC 6+3 (Reed-Solomon)
150
% · ~10⁻¹¹ · S3 / Hadoop EC
EC 10+4
140
% · ~10⁻¹³ · cold tier
Takeaway

Erasure coding gives more durability than 3× replication at half the storage cost. The trade is CPU on writes (encoding) and slower repair (decode N shards to rebuild one). For warm/cold tiers EC dominates; for hot reads 3× replication still wins on tail latency.

source
EXP 08c

SSD endurance — DWPD vs lifetime

Write amp eats your warranty.

Question

Given a drive's DWPD (drive writes per day) rating, how does database write amplification translate into lifetime years?

Setup

Modern enterprise SSD: 1.92 TB capacity, 1 DWPD over 5-year warranty. "Effective DWPD" = host writes ÷ capacity, with the database's write-amp factor.

Readings (years to wear out)
Postgres B-tree · 2× WA
12.5
years · comfortable
MyRocks LSM · 8× WA
3.1
years · borderline
RocksDB level · 22× WA
1.1
years · DWPD limited
Cassandra LCS · 30× WA
0.83
years · need 3 DWPD drive
Idle replica · 0.1× WA
250
years · NAND cell decays first
Takeaway

On write-heavy LSM workloads, drive endurance — not capacity — sizes the cluster. Buy 3 DWPD drives or pre-shrink write amp with universal compaction. The hidden cost of "we're replacing drives every year" is one of the strongest arguments for tuning compaction.

source
V

Network.

Where the speed of light is the actual limit.

EXP 09

TCP throughput vs RTT (BDP)

A 100ms link with a 64KB window can do 5 Mbps. That's it.

Question

For a TCP connection with window W and round-trip time R, what is the maximum throughput?

Setup

Throughput ≤ W / R. Below: max throughput at common RTTs for typical windows. The default Linux TCP receive window is auto-tuned up to 6 MB.

Readings (Mbps)
64 KB window · 1 ms
524
Mbps · same rack
64 KB window · 10 ms
52
Mbps · same region
64 KB window · 100 ms
5.2
Mbps · cross-region — fixed window kills you
6 MB window · 100 ms
480
Mbps · auto-tuned
6 MB window · 200 ms
240
Mbps · transpacific
BBR · 200 ms
950
Mbps · congestion-based
Takeaway

Throughput on long-fat pipes is window-bound, not bandwidth-bound. Confirm receive-buffer auto-tuning is on (it is by default on Linux ≥ 2.6.17) and consider BBR for high-RTT high-bandwidth paths. The classic "my 1 Gbps link only does 50 Mbps" story is almost always a window problem.

source
EXP 10

TLS handshake cost

TLS 1.3 cut the cold handshake in half.

Question

How much does each TLS variant cost in round trips and milliseconds on a 50 ms RTT link?

Setup

Round trips before first application byte. ms = RTTs × 50 ms (the dominant cost). CPU cost not included.

Readings (ms before first byte)
TLS 1.2 · cold
100
ms · 2 RTTs
TLS 1.2 · session resume
50
ms · 1 RTT
TLS 1.3 · cold
50
ms · 1 RTT
TLS 1.3 · 0-RTT resume
0
ms · piggybacked
QUIC · cold (HTTP/3)
50
ms · 1 RTT incl. transport
QUIC · 0-RTT resume
0
ms · single packet
Takeaway

Default to TLS 1.3 everywhere; the cold handshake savings alone are worth the migration. QUIC (HTTP/3) folds the transport handshake into the TLS handshake — one round trip instead of two — which is why CDN providers default to it on lossy networks.

source
EXP 10b

HTTP versions — concurrent streams

HTTP/2 turned 6 connections into one.

Question

How many concurrent in-flight requests can a single client maintain?

Setup

Browser-default behaviour against a single origin. Numbers reflect Chrome / Firefox 2024 settings.

Readings (concurrent streams)
HTTP/1.1 · pipelined
0
effectively zero — disabled in browsers
HTTP/1.1 · parallel
6
connections per origin
HTTP/2 · single conn
100
streams default · MAX_CONCURRENT_STREAMS
HTTP/2 · server-push
100
optional, deprecated by Chrome
HTTP/3 · QUIC streams
100
no head-of-line blocking on loss
Takeaway

HTTP/2 multiplexing eliminated the "six connections per origin" hack browsers used for a decade. HTTP/3 fixes the remaining issue: a packet loss in one HTTP/2 stream blocked all the others because TCP delivered bytes in order. QUIC streams are independent.

source
EXP 10c

Cross-region round trips

Speed of light in fibre is ~200,000 km/s.

Question

What round-trip latency floor does the speed of light impose between regions?

Setup

Great-circle distance × 2 ÷ 200,000 km/s gives the theoretical floor. Real RTT is typically 1.3-1.5× this due to non-direct fibre paths.

Readings (ms RTT)
us-east ↔ us-east-2 (Ohio)
12
ms · floor 8 ms
us-east ↔ us-west
60
ms · floor 40 ms
us-east ↔ eu-west (Ireland)
75
ms · transatlantic
us-east ↔ ap-northeast (Tokyo)
150
ms · transpacific
eu-west ↔ ap-south (Mumbai)
120
ms · backbone limited
us-east ↔ ap-southeast (Sydney)
200
ms · half the globe
Takeaway

No software optimisation can beat physics. A user in Sydney hitting a US-east API will see at least 200 ms before any code runs. Multi-region strategies — read replicas, edge compute, anycast — exist because of this floor, not despite it.

source
VI

Caching.

Hit ratio is the bill payer.

EXP 11

Zipfian hit ratio vs cache size

A small cache catches a huge fraction of traffic.

Question

Under a Zipf-distributed access pattern, what hit ratio does a cache of size k achieve out of N keys?

Setup

Zipf with α = 1.0 (typical for web traffic). Hit ratio = sum of probability mass for the top-k keys, ≈ H(k)/H(N) where H is the harmonic number.

Readings (% hit)
0.1% of keyspace
27
% hit · the long tail dominates here
1% of keyspace
51
% hit · half of traffic
5% of keyspace
71
%
10% of keyspace
79
%
25% of keyspace
89
%
50% of keyspace
95
%
Takeaway

A cache holding 1% of the keyspace catches half the traffic for typical Zipf-distributed access. This is why CDNs work — most users want a small popular subset. The long tail is real, but it is not the bill.

source
EXP 12

LRU vs LFU vs W-TinyLFU on a real trace

Modern admission policies beat both classics.

Question

On a real workload, what hit ratio does each eviction policy achieve at the same cache size?

Setup

ARC and W-TinyLFU benchmarks from the Caffeine project, search trace, cache size = 1% of unique keys.

Readings (% hit)
FIFO
41.2
% hit
LRU
47.1
% hit · the baseline
LFU
50.4
% hit · pure frequency
ARC
53.6
% hit · adaptive replacement
W-TinyLFU
57.8
% hit · admission + LRU
Belady (oracle)
62.2
% hit · upper bound
Takeaway

Modern admission policies (W-TinyLFU, used by Caffeine) close ~70% of the gap between LRU and the optimal Belady oracle. Free wins for any cache that's big enough to matter. The implementation tradeoff is some metadata per entry — usually 2-4 bits.

source
EXP 12b

CDN tiered cache — origin offload

Two tiers cut origin requests to 1%.

Question

How does adding a regional cache between edge and origin change the origin request rate?

Setup

Each layer's hit ratio multiplies. Cloudflare Tiered Cache and similar architectures from Fastly, AWS CloudFront.

Readings (% origin load)
No CDN
100
% of requests reach origin
Edge only · 90% hit
10
% · classic CDN
Edge + regional · 95% × 80%
1
% · two-tier hit
Three-tier · 95% × 85% × 80%
0.15
% · super-PoP shielded
Takeaway

Tiered caching is multiplicative. The edge catches the easy 90%; the regional catches half of what the edge missed; the result is a 100× origin offload. CDN providers price tiered cache as a premium feature for exactly this reason — it is the largest single cost lever for high-traffic sites.

source
VII

Distributed systems.

Coordination is the cost.

EXP 13

Consensus round-trip cost

A Paxos write is at least one cross-AZ RTT.

Question

How much latency does a quorum write add over a single-node write?

Setup

Paxos / Raft both require a round trip from leader to a majority of followers. Numbers below assume 2 ms intra-AZ, 4 ms cross-AZ in a 3-node, 3-AZ cluster.

Readings (ms write latency)
Single-node write
2
ms · fsync
Quorum write · 3 AZs
6
ms · leader + 1 follower
Quorum write · cross-region
80
ms · multi-region Spanner-style
Two-region · 1 follower far
80
ms · still 80, no win
Spanner global · TrueTime
10
ms · uncertainty bound
Takeaway

Strong consistency costs at least one cross-zone RTT per write. Multi-region strong consistency costs at least one cross-region RTT — typically 80-200 ms. Designs like Spanner buy back some of this with TrueTime; most systems accept the cost or relax to eventual consistency.

source
EXP 13b

Replication lag at saturation

Lag grows hyperbolically as primary fills up.

Question

How does async replication lag behave as the primary approaches its write capacity?

Setup

Same M/M/1 dynamics as exp 06, applied to the WAL streaming pipeline. Lag scales as ρ/(1−ρ) once the replica is the bottleneck.

Readings (s lag)
ρ = 0.30
0.05
s · imperceptible
ρ = 0.60
0.2
s · normal
ρ = 0.80
0.6
s · still acceptable
ρ = 0.90
1.5
s · alerting territory
ρ = 0.95
4.5
s · users notice stale reads
ρ = 0.99
30
s · pager goes off
Takeaway

Replication lag is a leading indicator of primary saturation. Alert at 5 seconds; page at 30. The fix is almost never "make replication faster" — it is "shed load from the primary" (cache, batch, async writes).

source
EXP 13c

Clock skew — NTP vs PTP vs TrueTime

NTP is good to milliseconds. TrueTime needs atomic clocks.

Question

How well-synchronised are clocks across a fleet?

Setup

Typical observed skew between independent servers, by sync method.

Readings (ms skew)
No sync
60,000
ms · drift over a day
NTP · WAN
30
ms typical
NTP · LAN
1
ms · with chronyd
PTP (IEEE 1588)
0.001
ms · sub-microsecond on hardware
Spanner TrueTime
0.005
ms uncertainty bound · GPS+atomic
Takeaway

Trusting wall-clock time across servers is a bug. Use logical clocks (Lamport, vector) or hybrid logical clocks (Cockroach) for ordering. If you must use wall-clock, allow at least 30 ms of skew, mark events as "happened within a window," and never rely on millisecond-precise causality.

source
EXP 13d

SLO error budget — minutes per quarter

Three nines is 7 hours. Five nines is 26 seconds.

Question

What downtime budget does each common SLO actually allow?

Setup

Allowed downtime per quarter (90 days) for each availability target.

Readings (min/quarter)
99% (two nines)
21,600
min / quarter — 15 days
99.9% (three nines)
130
min / quarter — 2.2 hrs
99.95%
65
min / quarter
99.99% (four nines)
13
min / quarter — one bad deploy
99.999% (five nines)
1.3
min / quarter — 78 seconds
99.9999% (six nines)
0.13
min / quarter — needs fault-tolerance, not luck
Takeaway

Each extra nine costs roughly 10× the engineering. Pick SLOs by user impact, not aspiration. Most consumer products live happily at 99.9%; payments and healthcare push for 99.99%; only telcos and stock exchanges need five-nines architectures.

source
VIII

Observability.

What you can see vs what you missed.

EXP 14

Cardinality blowup in metrics

A label per user_id, a Prometheus on fire.

Question

How many unique time-series do common label-set patterns create?

Setup

Each combination of label values creates a distinct time-series. Prometheus / Mimir scrape memory grows roughly linearly in series count.

Readings (distinct time-series)
method × status (3×6)
18
tiny · sane
+ endpoint (3×6×30)
540
still fine
+ service (3×6×30×100)
54,000
~50k · OK on a single node
+ pod_id (×500)
27,000,000
27M · Prometheus dies
+ user_id (×1M)
27,000,000,000
27 billion · don't
log scale
Takeaway

Label values must come from a closed, small set. user_id, request_id, session_id are forbidden as labels — those belong in logs or traces, never metrics. Prometheus reference: target ≤ 10M series total per instance.

source
EXP 15

Log volume vs cost

A debug log left on costs more than the engineer who left it.

Question

What does logging cost in commercial SaaS observability platforms?

Setup

Datadog Logs at standard pricing (mid-2024). Average log line ~250 bytes. 30-day retention. Volumes per service per day.

Readings ($ / month)
1 GB/day · 30 days
18
$/month — small service
10 GB/day · 30 days
180
$/month
100 GB/day · 30 days
1,800
$/month — typical mid-size
1 TB/day · 30 days
18,000
$/month — annual ~$220k
10 TB/day · 30 days
180,000
$/month — find a cheaper plan
Takeaway

Sample. Tier hot vs cold logs. Move debug-level off the hot path entirely. The "log everything, forever" default at scale is a six-figure mistake. Self-hosted alternatives (Loki, Vector + S3) cost 20-50× less but require operational effort.

source
EXP 16

Trace sampling — what 1% catches

Head-sampling ~1% misses the worst traces.

Question

For a service handling N requests, what fraction of "interesting" traces does each sampling strategy keep?

Setup

Imagine a 1M-request hour, with 1% being "slow" (>p99) and 0.1% being "errored". Sampling fractions kept by each strategy.

Readings (% of total traces)
Head sample 1%
1
% kept · independent of trace outcome
Head sample 0.1%
0.1
% kept · loses most errors
Tail sample errors-only
0.1
% kept · 100% of errors retained
Tail sample slow + errors
1.1
% kept · 100% of interesting traces
Adaptive (Honeycomb-style)
0.5
% kept · keeps representative + interesting
Takeaway

Head-based sampling is cheap but loses tail and error events. Tail-based sampling sees the whole trace before deciding to keep it — much more useful but requires a buffering layer. Production traces ship with adaptive: keep all errors, all slow, plus a fixed % of fast happy-path.

source
IX

Security parameters.

How long until brute force wins.

EXP 17

Password entropy vs cracking time

A 10-char password without a passphrase is sand.

Question

For a password of given alphabet and length, how long does brute force take at 1 trillion guesses/second?

Setup

Modern hashcat on a single 8×RTX-4090 rig achieves ~10¹² SHA-256 guesses/second (~10⁹ for bcrypt at cost 12). Time to exhaust the keyspace, in years.

Readings (years to exhaust)
8-char alphanumeric (62^8)
0
years · ~3.5 minutes — broken
8-char with symbols (94^8)
0.002
years · ~17 hours
12-char alphanumeric
110
years
12-char with symbols
14,000
years
4-word diceware (~52 bits)
143
years
6-word diceware (~78 bits)
9,000,000
years
log scale
Takeaway

Length beats character variety: a four-word passphrase beats a "Tr0ub@dor" type 10-char password by orders of magnitude. Use a password manager, generate 16+ chars or 5+ diceware words. Use bcrypt / Argon2 with cost ≥ 12 to slow the attacker by ~1000×.

source
EXP 18

TLS handshake — cert chain size

A long chain costs an extra round trip.

Question

How many bytes does each TLS cert-chain configuration add to the first round trip?

Setup

Server certificate plus its chain in the ServerHello. RSA-2048 cert ≈ 1.5 KB; ECDSA P-256 cert ≈ 0.5 KB. Initial congestion window is typically 10 segments × 1460 bytes ≈ 14.6 KB.

Readings (KB cert + chain)
ECDSA leaf only
0.5
KB · single cert
ECDSA leaf + 1 intermediate
1
KB · clean
RSA leaf only
1.5
KB · still 1 round-trip
RSA leaf + 2 intermediates
4.5
KB · 1 RT
RSA leaf + 4 intermediates
7.5
KB · still 1 RT
RSA + many intermediates (>14 KB)
16
KB · 2 round-trips · slow start kicks in
Takeaway

A bloated cert chain crosses the initcwnd boundary and triggers an extra round trip on the very first connection. Use ECDSA where possible (5× smaller). Strip unnecessary intermediates from your chain. Test with ssllabs.com.

source
Adjacent

From the numbers, to the systems.

Every measurement here corresponds to a guide, a simulator, or a foundation entry that explains the underlying mechanism. Read alongside.

Found this useful?