Volume XIV — opening pages

Numbers, measured.

Production decisions need orders of magnitude, not vibes. This notebook collects published, citable measurements for the systems we describe elsewhere — memory hierarchy, B-tree depth, Bloom filter accuracy, compression speed, queueing latency, fan-out tails, write amplification.

Storage & data structures.

How fast is each layer, and where the structure pays off.

EXP 01

The memory hierarchy

Each level is roughly an order of magnitude slower.

Question

How long does it take to read one byte from each level of the memory hierarchy on a modern server?

Setup

Numbers are typical for an x86-64 cloud VM circa 2024 (Skylake-class CPU, 3 GHz, NVMe SSD, 25 Gbit network). Latency varies with hardware; ratios are stable.

Readings (ns)

L1 cache

~0.5 ns; predictable

L2 cache

~10 cycles

L3 cache

Shared across cores

Main memory (RAM)

100

DRAM access

NVMe SSD (random read)

100,000

~100 µs

Same-DC network (10 GbE)

500,000

~500 µs round trip

Cross-region (US-EU)

100,000,000

~100 ms; speed of light

HDD (random seek)

10,000,000

~10 ms; rotational

log scale

Takeaway

Cache locality dominates algorithmic complexity for small n. A well-laid-out array beats a smart tree of pointers up to 10,000 elements because the array never misses cache; the tree pays L2 / L3 / RAM penalties on every pointer chase.

→ source

EXP 02

B-tree depth vs row count

Why your index never gets deep.

Question

How deep does a B+tree get for a billion-row index?

Setup

Each internal node holds 200 keys (typical for 4 KiB pages with 8-byte keys + 8-byte child pointers). Each leaf holds rows. Depth = ⌈log_b(n)⌉ + 1.

Readings (levels)

1 thousand rows

2 page reads

100 thousand rows

3 page reads

10 million rows

4 page reads

1 billion rows

5 page reads

100 billion rows

6 page reads

10 trillion rows

7 page reads

Takeaway

B-trees stay shallow because of the wide fan-out. A binary tree at 10 million rows is 24 levels deep; the B+tree is 4. That's the difference between 24 disk seeks and 4 — the order-of-magnitude that makes databases feasible.

→ source

EXP 03

Bloom filter false-positive rate

10 bits per element ≈ 1% false positive.

Question

Given m bits per element and k hash functions, what false-positive rate do you get?

Setup

Standard Bloom filter formula: FPR ≈ (1 − e^(−kn/m))^k, with optimal k = (m/n) ln 2. Numbers below assume optimal k.

Readings (% false positive)

4 bits/elt

14.6

optimal k=3

8 bits/elt

2.16

optimal k=6

10 bits/elt

0.96

optimal k=7 — the popular default

16 bits/elt

0.046

optimal k=11

24 bits/elt

0.001

optimal k=17

32 bits/elt

optimal k=22

log scale

Takeaway

Doubling the bits-per-element roughly squares the FPR. Cassandra and RocksDB use 10 bits/elt (~1% FPR) by default. Higher precision is rarely worth it: the cost of the rare false positive is one extra disk read.

→ source

EXP 03b

Hash table — collisions vs load factor

Past 0.85, every insert is a fight.

Question

How many probes does an open-addressing hash table take per lookup as load factor α rises?

Setup

Linear probing with random hashing. Expected probes per successful lookup ≈ ½(1 + 1/(1−α)). Numbers below are mean probes for a successful lookup.

Readings (probes)

α = 0.50

1.5

comfortable

α = 0.75

2.5

Java HashMap default

α = 0.85

3.8

starting to bite

α = 0.90

5.5

visible slowdown

α = 0.95

10.5

do not run here

α = 0.99

50.5

cliff

Takeaway

Resize before α crosses 0.75. Open-addressing tables go unstable in their tail, not their mean — at α = 0.9 the p99 of a probe sequence is well into the dozens. Robin-hood hashing softens the tail; it does not move the wall.

→ source

EXP 03c

LSM read amplification by levels

Bloom filters are why LSMs are usable.

Question

How many SSTables does a point lookup touch in an LSM tree, with and without bloom filters?

Setup

Worst case: N levels means N SSTable lookups. With bloom filters at FPR p, expected disk reads ≈ p × (N − 1) + 1.

Readings (avg disk reads)

4 levels · no bloom

always touches all

4 levels · bloom 1%

1.03

almost always one read

7 levels · no bloom

classic RocksDB shape

7 levels · bloom 1%

1.06

still ~one read

10 levels · bloom 1%

1.09

why bloom matters

10 levels · bloom 0.1%

1.009

diminishing returns

Takeaway

Bloom filters are not optional for LSM point reads. They convert what would be 7-10× read amplification into ~1 disk read on the average path, at the cost of ~12 bits/key in RAM. RocksDB enables them by default for exactly this reason.

→ source

Compression.

Speed-vs-size, with real bytes.

EXP 04

gzip vs brotli vs zstd

Compression ratio vs encode/decode speed.

Question

How much does each modern compressor reduce a 1 MB JSON document, and at what speed?

Setup

Squash benchmark v0.7, x86-64, default level for each. JSON corpus from the Squash dataset. Numbers are MB/s for compress and decompress.

Readings (mixed)

gzip · ratio

5.4

gzip · compress

MB/s

gzip · decompress

360

MB/s

brotli · ratio

6.5

× — best for static assets

brotli · compress

0.9

MB/s — slow at default

brotli · decompress

380

MB/s

zstd · ratio

5.7

× — at default

zstd · compress

530

MB/s — fastest mainstream encode

zstd · decompress

1,500

MB/s — fastest decode

Takeaway

For real-time encode (logs, RPC bodies), zstd is the clear winner — same ratio as gzip, 5x encode speed, 4x decode. Brotli only wins for static web assets where you encode once and decode billions of times.

→ source

EXP 05

JSON vs Protobuf wire size

A 25–40% reduction, give or take.

Question

How much smaller is a Protobuf message vs the equivalent JSON for typical record shapes?

Setup

Compared on five common shapes: a payments record, an analytics event, a user profile, a search hit, a config blob. Sizes are uncompressed.

Readings (% of JSON)

Payments record

% of JSON size

Analytics event

User profile

Search hit

Config blob

Takeaway

25–45% reduction is typical. Small numeric values benefit most (varint encoding); deeply nested optional fields benefit less. After gzip, the gap narrows considerably — gzipped JSON is often within 10% of gzipped Protobuf. Protobuf still wins on parse cost and schema discipline.

→ source

EXP 05b

Columnar compression on time-series

A row-store loses to Parquet by 10×.

Question

How much smaller is the same data in a columnar format with appropriate encodings?

Setup

Same 1 GB time-series table (timestamp, sensor_id, value, tag). Stored in CSV, Parquet+Snappy (default), Parquet+ZSTD, ORC. ClickBench-style measurements.

Readings (MB on disk)

CSV · uncompressed

1,024

MB · baseline

CSV · gzip

280

MB · 3.7× smaller

Parquet · Snappy

110

MB · 9.3× smaller

Parquet · ZSTD

MB · 13× smaller

ORC · ZSTD

MB · best columnar

Takeaway

Columnar formats win by storing each column contiguously and applying type-aware encodings (RLE, delta, dictionary) before the entropy coder runs. The 10× edge over compressed CSV is why every analytics warehouse — BigQuery, Snowflake, ClickHouse, Redshift — is column-oriented.

→ source

EXP 05c

JSON encoding overhead per row

Field names cost more than values.

Question

How much of a JSON payload is structural overhead (keys, braces, quotes, commas) for typical small rows?

Setup

Five representative rows of varying field count and value type. "Value bytes" is the sum of value lengths; "structural" is everything else.

Readings (% structural)

3 fields · short str

% structural

6 fields · numeric

10 fields · short str

20 fields · mixed

50 fields · nested

Takeaway

A "JSON event" is 50-70% braces and key names. This is why MessagePack, CBOR, Avro and Protobuf can shrink small payloads dramatically — the field tag is a single byte, not a quoted string. Compression flattens this advantage; uncompressed wires do not.

→ source

III

Concurrency & tail latency.

When parallelism stops paying.

EXP 06

M/M/1 queue: latency at high utilisation

You cannot run a server at 99% utilisation.

Question

How does mean wait time grow as utilisation ρ approaches 1?

Setup

Standard M/M/1 result: mean wait W = (1/μ) × ρ/(1−ρ). Below: ρ vs the multiplier on baseline service time.

Readings (× service time)

ρ = 0.50

1× service time wait

ρ = 0.70

2.3

2.3×

ρ = 0.80

4×

ρ = 0.90

9×

ρ = 0.95

19×

ρ = 0.99

99× — and growing

ρ = 0.999

999

999×

Takeaway

Aim for 60–70% utilisation for latency-sensitive services. The (1−ρ) denominator makes the curve hyperbolic past 0.9. A "well-utilised" 95% server is in the latency death zone.

→ source

EXP 07

Tail latency for fan-out queries

p99 of one becomes p50 of 100.

Question

If you fan out a query to N parallel workers and wait for all, how does the system's p99 latency change?

Setup

Each worker has independent latency with a heavy-tailed distribution (Weibull-like, p99 ≈ 5× p50). System latency is the max of N samples.

Readings (× single-worker p50)

N = 1

p99 ≈ 5× p50

N = 10

9× — p99 of one is now p50 of 10

N = 100

13×

N = 1,000

17×

N = 10,000

22×

Takeaway

In fan-out, the tail dominates. Mitigations from Dean & Barroso 2013: hedged requests (fire two, take the first), bounded latency (timeout and partial result), tied requests (cancel the slow one when the first returns). Without them, p99 of the system explodes.

→ source

EXP 07b

Amdahl's law

5% serial means a hard ceiling at 20×.

Question

If a fraction s of work is inherently serial, how much speedup can N parallel cores give you?

Setup

Amdahl 1967: speedup = 1 / (s + (1−s)/N). Below: speedup at N = 16, 64, 256, ∞ for typical serial fractions.

Readings (× speedup)

s = 1% · N=16

13.9

~87% efficiency

s = 1% · N=∞

100

hard ceiling

s = 5% · N=16

9.1

~57%

s = 5% · N=∞

ceiling at 20×

s = 10% · N=64

8.8

~14%

s = 10% · N=∞

no point past 32 cores

s = 25% · N=∞

parallel barely helps

Takeaway

Adding cores past Amdahl's ceiling only burns money. The bigger the system gets, the more important the serial 1-5% becomes — locks, cross-shard coordination, leader-only operations. The optimisation that pays at scale is removing the serial fraction, not adding cores.

→ source

Storage amplification.

What you write isn't what hits disk.

EXP 08

LSM-tree write amplification

Flushed once, compacted many times.

Question

How many bytes hit disk per byte logically written, by compaction strategy?

Setup

RocksDB with default level-style compaction, ~10 levels, 10× growth factor. Numbers from Facebook's 2016 RocksDB paper and follow-ups.

Readings (× write amp)

Level (default)

each byte rewritten ~22× across compactions

Universal (size-tiered)

fewer rewrites, more space amp

FIFO

no compaction, oldest segment dropped

B-tree (Postgres)

one page write + one WAL write

Takeaway

Choosing between B-tree and LSM is choosing what to amplify. B-tree: low write amp, higher random I/O. LSM: high write amp, sequential I/O — perfect for SSDs. Universal compaction (used by ScyllaDB) trades space for write amp.

→ source

EXP 08b

Replication factor — bytes vs durability

Eleven nines costs less than you think.

Question

How does storage cost change as you add replicas or switch to erasure coding?

Setup

Per-byte storage overhead for each scheme. Durability target: probability the data is lost in a year given typical disk AFR ~1%.

Readings (% of raw bytes)

Single copy

100

% · ~1% loss/yr · don't

2× replication

200

% · ~10⁻⁴ loss/yr

3× replication

300

% · ~10⁻⁶ loss/yr · classic HDFS

EC 6+3 (Reed-Solomon)

150

% · ~10⁻¹¹ · S3 / Hadoop EC

EC 10+4

140

% · ~10⁻¹³ · cold tier

Takeaway

Erasure coding gives more durability than 3× replication at half the storage cost. The trade is CPU on writes (encoding) and slower repair (decode N shards to rebuild one). For warm/cold tiers EC dominates; for hot reads 3× replication still wins on tail latency.

→ source

EXP 08c

SSD endurance — DWPD vs lifetime

Write amp eats your warranty.

Question

Given a drive's DWPD (drive writes per day) rating, how does database write amplification translate into lifetime years?

Setup

Modern enterprise SSD: 1.92 TB capacity, 1 DWPD over 5-year warranty. "Effective DWPD" = host writes ÷ capacity, with the database's write-amp factor.

Readings (years to wear out)

Postgres B-tree · 2× WA

12.5

years · comfortable

MyRocks LSM · 8× WA

3.1

years · borderline

RocksDB level · 22× WA

1.1

years · DWPD limited

Cassandra LCS · 30× WA

0.83

years · need 3 DWPD drive

Idle replica · 0.1× WA

250

years · NAND cell decays first

Takeaway

On write-heavy LSM workloads, drive endurance — not capacity — sizes the cluster. Buy 3 DWPD drives or pre-shrink write amp with universal compaction. The hidden cost of "we're replacing drives every year" is one of the strongest arguments for tuning compaction.

→ source

Network.

Where the speed of light is the actual limit.

EXP 09

TCP throughput vs RTT (BDP)

A 100ms link with a 64KB window can do 5 Mbps. That's it.

Question

For a TCP connection with window W and round-trip time R, what is the maximum throughput?

Setup

Throughput ≤ W / R. Below: max throughput at common RTTs for typical windows. The default Linux TCP receive window is auto-tuned up to 6 MB.

Readings (Mbps)

64 KB window · 1 ms

524

Mbps · same rack

64 KB window · 10 ms

Mbps · same region

64 KB window · 100 ms

5.2

Mbps · cross-region — fixed window kills you

6 MB window · 100 ms

480

Mbps · auto-tuned

6 MB window · 200 ms

240

Mbps · transpacific

BBR · 200 ms

950

Mbps · congestion-based

Takeaway

Throughput on long-fat pipes is window-bound, not bandwidth-bound. Confirm receive-buffer auto-tuning is on (it is by default on Linux ≥ 2.6.17) and consider BBR for high-RTT high-bandwidth paths. The classic "my 1 Gbps link only does 50 Mbps" story is almost always a window problem.

→ source

EXP 10

TLS handshake cost

TLS 1.3 cut the cold handshake in half.

Question

How much does each TLS variant cost in round trips and milliseconds on a 50 ms RTT link?

Setup

Round trips before first application byte. ms = RTTs × 50 ms (the dominant cost). CPU cost not included.

Readings (ms before first byte)

TLS 1.2 · cold

100

ms · 2 RTTs

TLS 1.2 · session resume

ms · 1 RTT

TLS 1.3 · cold

ms · 1 RTT

TLS 1.3 · 0-RTT resume

ms · piggybacked

QUIC · cold (HTTP/3)

ms · 1 RTT incl. transport

QUIC · 0-RTT resume

ms · single packet

Takeaway

Default to TLS 1.3 everywhere; the cold handshake savings alone are worth the migration. QUIC (HTTP/3) folds the transport handshake into the TLS handshake — one round trip instead of two — which is why CDN providers default to it on lossy networks.

→ source

EXP 10b

HTTP versions — concurrent streams

HTTP/2 turned 6 connections into one.

Question

How many concurrent in-flight requests can a single client maintain?

Setup

Browser-default behaviour against a single origin. Numbers reflect Chrome / Firefox 2024 settings.

Readings (concurrent streams)

HTTP/1.1 · pipelined

effectively zero — disabled in browsers

HTTP/1.1 · parallel

connections per origin

HTTP/2 · single conn

100

streams default · MAX_CONCURRENT_STREAMS

HTTP/2 · server-push

100

optional, deprecated by Chrome

HTTP/3 · QUIC streams

100

no head-of-line blocking on loss

Takeaway

HTTP/2 multiplexing eliminated the "six connections per origin" hack browsers used for a decade. HTTP/3 fixes the remaining issue: a packet loss in one HTTP/2 stream blocked all the others because TCP delivered bytes in order. QUIC streams are independent.

→ source

EXP 10c

Cross-region round trips

Speed of light in fibre is ~200,000 km/s.

Question

What round-trip latency floor does the speed of light impose between regions?

Setup

Great-circle distance × 2 ÷ 200,000 km/s gives the theoretical floor. Real RTT is typically 1.3-1.5× this due to non-direct fibre paths.

Readings (ms RTT)

us-east ↔ us-east-2 (Ohio)

ms · floor 8 ms

us-east ↔ us-west

ms · floor 40 ms

us-east ↔ eu-west (Ireland)

ms · transatlantic

us-east ↔ ap-northeast (Tokyo)

150

ms · transpacific

eu-west ↔ ap-south (Mumbai)

120

ms · backbone limited

us-east ↔ ap-southeast (Sydney)

200

ms · half the globe

Takeaway

No software optimisation can beat physics. A user in Sydney hitting a US-east API will see at least 200 ms before any code runs. Multi-region strategies — read replicas, edge compute, anycast — exist because of this floor, not despite it.

→ source

Caching.

Hit ratio is the bill payer.

EXP 11

Zipfian hit ratio vs cache size

A small cache catches a huge fraction of traffic.

Question

Under a Zipf-distributed access pattern, what hit ratio does a cache of size k achieve out of N keys?

Setup

Zipf with α = 1.0 (typical for web traffic). Hit ratio = sum of probability mass for the top-k keys, ≈ H(k)/H(N) where H is the harmonic number.

Readings (% hit)

0.1% of keyspace

% hit · the long tail dominates here

1% of keyspace

% hit · half of traffic

5% of keyspace

10% of keyspace

25% of keyspace

50% of keyspace

Takeaway

A cache holding 1% of the keyspace catches half the traffic for typical Zipf-distributed access. This is why CDNs work — most users want a small popular subset. The long tail is real, but it is not the bill.

→ source

EXP 12

LRU vs LFU vs W-TinyLFU on a real trace

Modern admission policies beat both classics.

Question

On a real workload, what hit ratio does each eviction policy achieve at the same cache size?

Setup

ARC and W-TinyLFU benchmarks from the Caffeine project, search trace, cache size = 1% of unique keys.

Readings (% hit)

FIFO

41.2

% hit

LRU

47.1

% hit · the baseline

LFU

50.4

% hit · pure frequency

ARC

53.6

% hit · adaptive replacement

W-TinyLFU

57.8

% hit · admission + LRU

Belady (oracle)

62.2

% hit · upper bound

Takeaway

Modern admission policies (W-TinyLFU, used by Caffeine) close ~70% of the gap between LRU and the optimal Belady oracle. Free wins for any cache that's big enough to matter. The implementation tradeoff is some metadata per entry — usually 2-4 bits.

→ source

EXP 12b

CDN tiered cache — origin offload

Two tiers cut origin requests to 1%.

Question

How does adding a regional cache between edge and origin change the origin request rate?

Setup

Each layer's hit ratio multiplies. Cloudflare Tiered Cache and similar architectures from Fastly, AWS CloudFront.

Readings (% origin load)

No CDN

100

% of requests reach origin

Edge only · 90% hit

% · classic CDN

Edge + regional · 95% × 80%

% · two-tier hit

Three-tier · 95% × 85% × 80%

0.15

% · super-PoP shielded

Takeaway

Tiered caching is multiplicative. The edge catches the easy 90%; the regional catches half of what the edge missed; the result is a 100× origin offload. CDN providers price tiered cache as a premium feature for exactly this reason — it is the largest single cost lever for high-traffic sites.

→ source

VII

Distributed systems.

Coordination is the cost.

EXP 13

Consensus round-trip cost

A Paxos write is at least one cross-AZ RTT.

Question

How much latency does a quorum write add over a single-node write?

Setup

Paxos / Raft both require a round trip from leader to a majority of followers. Numbers below assume 2 ms intra-AZ, 4 ms cross-AZ in a 3-node, 3-AZ cluster.

Readings (ms write latency)

Single-node write

ms · fsync

Quorum write · 3 AZs

ms · leader + 1 follower

Quorum write · cross-region

ms · multi-region Spanner-style

Two-region · 1 follower far

ms · still 80, no win

Spanner global · TrueTime

ms · uncertainty bound

Takeaway

Strong consistency costs at least one cross-zone RTT per write. Multi-region strong consistency costs at least one cross-region RTT — typically 80-200 ms. Designs like Spanner buy back some of this with TrueTime; most systems accept the cost or relax to eventual consistency.

→ source

EXP 13b

Replication lag at saturation

Lag grows hyperbolically as primary fills up.

Question

How does async replication lag behave as the primary approaches its write capacity?

Setup

Same M/M/1 dynamics as exp 06, applied to the WAL streaming pipeline. Lag scales as ρ/(1−ρ) once the replica is the bottleneck.

Readings (s lag)

ρ = 0.30

0.05

s · imperceptible

ρ = 0.60

0.2

s · normal

ρ = 0.80

0.6

s · still acceptable

ρ = 0.90

1.5

s · alerting territory

ρ = 0.95

4.5

s · users notice stale reads

ρ = 0.99

s · pager goes off

Takeaway

Replication lag is a leading indicator of primary saturation. Alert at 5 seconds; page at 30. The fix is almost never "make replication faster" — it is "shed load from the primary" (cache, batch, async writes).

→ source

EXP 13c

Clock skew — NTP vs PTP vs TrueTime

NTP is good to milliseconds. TrueTime needs atomic clocks.

Question

How well-synchronised are clocks across a fleet?

Setup

Typical observed skew between independent servers, by sync method.

Readings (ms skew)

No sync

60,000

ms · drift over a day

NTP · WAN

ms typical

NTP · LAN

ms · with chronyd

PTP (IEEE 1588)

0.001

ms · sub-microsecond on hardware

Spanner TrueTime

0.005

ms uncertainty bound · GPS+atomic

Takeaway

Trusting wall-clock time across servers is a bug. Use logical clocks (Lamport, vector) or hybrid logical clocks (Cockroach) for ordering. If you must use wall-clock, allow at least 30 ms of skew, mark events as "happened within a window," and never rely on millisecond-precise causality.

→ source

EXP 13d

SLO error budget — minutes per quarter

Three nines is 7 hours. Five nines is 26 seconds.

Question

What downtime budget does each common SLO actually allow?

Setup

Allowed downtime per quarter (90 days) for each availability target.

Readings (min/quarter)

99% (two nines)

21,600

min / quarter — 15 days

99.9% (three nines)

130

min / quarter — 2.2 hrs

99.95%

min / quarter

99.99% (four nines)

min / quarter — one bad deploy

99.999% (five nines)

1.3

min / quarter — 78 seconds

99.9999% (six nines)

0.13

min / quarter — needs fault-tolerance, not luck

Takeaway

Each extra nine costs roughly 10× the engineering. Pick SLOs by user impact, not aspiration. Most consumer products live happily at 99.9%; payments and healthcare push for 99.99%; only telcos and stock exchanges need five-nines architectures.

→ source

VIII

Observability.

What you can see vs what you missed.

EXP 14

Cardinality blowup in metrics

A label per user_id, a Prometheus on fire.

Question

How many unique time-series do common label-set patterns create?

Setup

Each combination of label values creates a distinct time-series. Prometheus / Mimir scrape memory grows roughly linearly in series count.

Readings (distinct time-series)

method × status (3×6)

tiny · sane

+ endpoint (3×6×30)

540

still fine

+ service (3×6×30×100)

54,000

~50k · OK on a single node

+ pod_id (×500)

27,000,000

27M · Prometheus dies

+ user_id (×1M)

27,000,000,000

27 billion · don't

log scale

Takeaway

Label values must come from a closed, small set. user_id, request_id, session_id are forbidden as labels — those belong in logs or traces, never metrics. Prometheus reference: target ≤ 10M series total per instance.

→ source

EXP 15

Log volume vs cost

A debug log left on costs more than the engineer who left it.

Question

What does logging cost in commercial SaaS observability platforms?

Setup

Datadog Logs at standard pricing (mid-2024). Average log line ~250 bytes. 30-day retention. Volumes per service per day.

Readings ($ / month)

1 GB/day · 30 days

$/month — small service

10 GB/day · 30 days

180

$/month

100 GB/day · 30 days

1,800

$/month — typical mid-size

1 TB/day · 30 days

18,000

$/month — annual ~$220k

10 TB/day · 30 days

180,000

$/month — find a cheaper plan

Takeaway

Sample. Tier hot vs cold logs. Move debug-level off the hot path entirely. The "log everything, forever" default at scale is a six-figure mistake. Self-hosted alternatives (Loki, Vector + S3) cost 20-50× less but require operational effort.

→ source

EXP 16

Trace sampling — what 1% catches

Head-sampling ~1% misses the worst traces.

Question

For a service handling N requests, what fraction of "interesting" traces does each sampling strategy keep?

Setup

Imagine a 1M-request hour, with 1% being "slow" (>p99) and 0.1% being "errored". Sampling fractions kept by each strategy.

Readings (% of total traces)

Head sample 1%

% kept · independent of trace outcome

Head sample 0.1%

0.1

% kept · loses most errors

Tail sample errors-only

0.1

% kept · 100% of errors retained

Tail sample slow + errors

1.1

% kept · 100% of interesting traces

Adaptive (Honeycomb-style)

0.5

% kept · keeps representative + interesting

Takeaway

Head-based sampling is cheap but loses tail and error events. Tail-based sampling sees the whole trace before deciding to keep it — much more useful but requires a buffering layer. Production traces ship with adaptive: keep all errors, all slow, plus a fixed % of fast happy-path.

→ source

Security parameters.

How long until brute force wins.

EXP 17

Password entropy vs cracking time

A 10-char password without a passphrase is sand.

Question

For a password of given alphabet and length, how long does brute force take at 1 trillion guesses/second?

Setup

Modern hashcat on a single 8×RTX-4090 rig achieves ~10¹² SHA-256 guesses/second (~10⁹ for bcrypt at cost 12). Time to exhaust the keyspace, in years.

Readings (years to exhaust)

8-char alphanumeric (62^8)

years · ~3.5 minutes — broken

8-char with symbols (94^8)

0.002

years · ~17 hours

12-char alphanumeric

110

years

12-char with symbols

14,000

years

4-word diceware (~52 bits)

143

years

6-word diceware (~78 bits)

9,000,000

years

log scale

Takeaway

Length beats character variety: a four-word passphrase beats a "Tr0ub@dor" type 10-char password by orders of magnitude. Use a password manager, generate 16+ chars or 5+ diceware words. Use bcrypt / Argon2 with cost ≥ 12 to slow the attacker by ~1000×.

→ source

EXP 18

TLS handshake — cert chain size

A long chain costs an extra round trip.

Question

How many bytes does each TLS cert-chain configuration add to the first round trip?

Setup

Server certificate plus its chain in the ServerHello. RSA-2048 cert ≈ 1.5 KB; ECDSA P-256 cert ≈ 0.5 KB. Initial congestion window is typically 10 segments × 1460 bytes ≈ 14.6 KB.

Readings (KB cert + chain)

ECDSA leaf only

0.5

KB · single cert

ECDSA leaf + 1 intermediate

KB · clean

RSA leaf only

1.5

KB · still 1 round-trip

RSA leaf + 2 intermediates

4.5

KB · 1 RT

RSA leaf + 4 intermediates

7.5

KB · still 1 RT

RSA + many intermediates (>14 KB)

KB · 2 round-trips · slow start kicks in

Takeaway

A bloated cert chain crosses the initcwnd boundary and triggers an extra round trip on the very first connection. Use ECDSA where possible (5× smaller). Strip unnecessary intermediates from your chain. Test with ssllabs.com.

→ source

Adjacent

From the numbers, to the systems.

Every measurement here corresponds to a guide, a simulator, or a foundation entry that explains the underlying mechanism. Read alongside.

Open Foundations → Or papers

Found this useful?