Napkin math, fully shown.
Every system-design round eventually asks you to estimate. DAU to QPS. QPS to storage. Storage to shards. Shards to instance count. The math is simple. The reps are what make it automatic. Keep this page open as a reference card: powers of two, latency numbers drawn to scale, the five derivations, a live worksheet, and three worked examples.
Powers of two — the table you carry
Memorise five numbers. 210 = 1 K, 220 = 1 M, 230 = 1 B, 232 = 4 B, 240 = 1 T. Everything else follows. The pattern: every 10 extra bits is roughly three orders of magnitude, because 1024 ≈ 1000.
Practical use: a 64-bit signed ID holds 263 values — 9 quintillion, effectively infinite for any system you'll ever ship. A 32-bit user ID holds 231, about 2 billion — which is the ceiling Facebook hit. When you're picking between int and bigint for a primary key, this table tells you whether you'll regret it in five years.
Latency numbers, with proportional bars
Jeff Dean's "numbers every programmer should know," redrawn so you can see the orders-of-magnitude gaps. The bars are on a log scale — each tick is 10×. Memorise the gaps, not the exact numbers.
The five derivations
- 01 · DAU → average QPSavgQPS = DAU × actions_per_user_per_day / 86,400
Each daily-active user does some number of actions: page views, messages sent, likes. Multiply, then divide by seconds-in-a-day (86,400, close enough to 100 K to round when you need to). 10M DAU × 5 actions = 50M actions/day ≈ 580 QPS average.
- 02 · Average → peak QPSpeakQPS = avgQPS × peak_multiplier (typically 3-5x)
Traffic is never uniform. Peak hour of peak day usually lands 3–5× the daily average — more if your product has a strong daily rhythm (commute apps spike at 8am and 6pm). Size compute for peak with headroom. Size storage for average.
- 03 · QPS → storage growthdaily_storage = writes_per_day × avg_record_size × replication_factor × index_overhead
Average record size has to include the value, index entries, replication overhead (typically 3×), and the metadata dark matter you never quite remember to count. A common mistake: thinking storage is "rows × size" when the real answer is closer to "rows × size × 3 × 1.3" for a replicated system with one secondary index.
- 04 · QPS → instance countinstances = ceil(peakQPS / sustainable_QPS_per_instance)
Per-instance throughput depends on the workload (CPU-bound or IO-bound), but as a starting point: 100–500 QPS for a typical web service, 1000–2000 for a tight Go service, 10–50 for anything doing real work per request. Round up and add a 30% headroom buffer.
- 05 · Storage → shard countshards = ceil(total_storage / storage_per_shard)
A single shard caps around 500 GB to 1 TB before rebalancing turns painful. 50 TB of data needs 50–100 shards. Pick the next power of two above the calculation (32, 64, 128…) because resharding by doubling is much cleaner than by some arbitrary factor.
Try it — the live worksheet
Pick a DAU. The six derivations update live. Every assumption is a slider, so you can defend the input and let the math fall out.
Every number is just multiplication and division. The interview skill is naming the inputs explicitly — "5 actions per user per day, peak 4× average" — so the room argues with the number instead of the math.
Three worked examples
Three real product shapes, with the numbers run end to end. Same formulas. Wildly different system shapes once the input assumptions land.
- Inputs. 250M DAU. Each user views a timeline 10× per day (2.5B reads); writes a tweet 0.2× per day (50M writes). Read:write ratio ≈ 50:1.
- Average QPS. 2.5B reads / 86,400 ≈ 29 K reads/sec. 50M writes / 86,400 ≈ 580 writes/sec.
- Peak QPS. Twitter has a 5× peak multiplier (big events, breaking news). 145 K reads/sec peak, 3 K writes/sec peak.
- Storage. 280-byte tweet + 200-byte metadata × 50M tweets/day = 24 GB/day. With 3× replication and an index, ~100 GB/day, ~36 TB/year. Modest.
- Instance count. Read tier at 1000 QPS/instance: 145 K / 1000 = 145 instances minimum, 200 with headroom. Write tier sharded by user ID: ~20 instances.
- Shape conclusion. Read-heavy means caching is the whole game. Memcache layer in front of the timeline; pre-computed timelines for active users; lazy fan-out for celebrities. Sharding is by user ID, not by tweet.
- Inputs. 500M DAU. 2 photo uploads per user per day = 1B uploads/day. Average photo 2 MB processed → 200 KB delivered. Each photo viewed by 20 followers on average.
- QPS. Writes: 1B / 86,400 = 11 K writes/sec average, 40 K peak. Reads: 20B views / 86,400 = 230 K reads/sec, 1M peak.
- Bandwidth. Photo upload: 11 K × 2 MB = 22 GB/sec ingress. Photo serve: 230 K × 200 KB = 46 GB/sec egress. Bandwidth is the dominant cost — CDN is a hard requirement.
- Storage. 1B × 2 MB = 2 PB/day raw. With variants (thumbnails + multiple resolutions), 5 PB/day. 1.8 EB/year at one year of retention.
- Shape conclusion. Object store (S3-shaped) for the bytes. Metadata in a sharded RDBMS. Heavy CDN front-end so the 46 GB/sec read traffic doesn't reach origin. Image processing in async workers (resize, compress, thumbnail extract).
- Inputs. 12M DAU. 50 messages sent + 500 messages received per user per day. Each message averages 100 bytes (text + metadata).
- QPS. Sends: 600M / 86,400 = 7 K sends/sec average, 25 K peak. Receives via fan-out: 6B / 86,400 = 70 K deliveries/sec, 250 K peak.
- Connections. All 12M users hold a persistent WebSocket while active. Realistic concurrent peak: 4M. At ~20 K connections per WebSocket front, that's 200 front-end servers.
- Storage. 600M messages × 100 bytes = 60 GB/day. Replicated and indexed: ~250 GB/day. With infinite retention: 90 TB/year. Very modest for the QPS load.
- Shape conclusion. The cost is in fan-out and persistent connections, not in storage. Sharded WebSocket layer, Redis pub/sub for cross-shard fan-out, eventually-consistent reads from a sharded RDBMS for history. The hard part is presence (who's online) and history sync after a reconnect, not the messages themselves.
What candidates get wrong
- Skipping the assumptions. "5K QPS" with no stated DAU and actions-per-user can't be checked. Lead with the inputs so the interviewer can argue with a number instead of with your conclusion. The conclusion is the easy part; the inputs are the conversation.
- Forgetting replication and indexes. A 10M-row table at 1 KB per row is 10 GB on disk only if there's one copy and zero indexes. With 3× replication and two indexes (each ~30% of base table size) the real number is closer to 50 GB. Multiply by 5, not by 1.
- Sizing for average instead of peak. You don't size instance count for the daily average. Size it for the busiest minute of the busiest day, plus 30% slack. That 30% absorbs the variance Little's Law warns you about.
- Using "10 ms" as if it means anything. 10 ms could be a local SSD read, a same-DC RPC, or three of them in sequence. The latency-numbers table above is the reference scale — use it.
- Skipping bandwidth. Bandwidth quietly kills high-traffic services. A 100 KB response at 50K QPS is 40 Gbps — past the limit of a single NIC, which forces architectural choices (CDN, edge caching, shrinking the response). Run the multiplication.
- Designing for ten years from now. 10× current peak is a sensible ceiling. Past that, you'll redesign before you ever get there. Twitter's 2010 architecture wouldn't survive 2026; 2026's architecture would have been over-engineered in 2010. Pick a horizon and live in it.
Little's Law, the escape hatch
When the math gets tangled, Little's Law bails you out. concurrency = throughput × latency. Name any two and the third writes itself. Most of the derivations on this page are Little's Law in disguise.
In practice: "100K QPS, each request takes 50 ms, so I need at least 5,000 concurrent slots." Whether those slots are threads, goroutines, async tasks, database connections, or HTTP/2 streams is the next conversation. The number itself is fixed by the equation.
Related on Semicolony
- Latency vs throughput — Little's Law in full, with a calculator.
- Performance vs scalability — the per-system version of the same trade-off.
- Availability patterns — the math behind nines.
- System-design playbook — 19 worked problems applying these numbers in full.
- Performance methods (RED, USE) — what to measure once you've sized.
- Paper — The Tail at Scale on why P99 matters more than the median.
- Jeff Dean's original latency-numbers gist
- Simon Eskildsen's napkin-math repo — the canonical practice exercises.