capacity planning

Flagship reference

Napkin math, fully shown.

Every system-design round eventually asks you to estimate. DAU to QPS. QPS to storage. Storage to shards. Shards to instance count. The math is simple. The reps are what make it automatic. Keep this page open as a reference card: powers of two, latency numbers drawn to scale, the five derivations, a live worksheet, and three worked examples.

Powers of two — the table you carry

Memorise five numbers. 2¹⁰ = 1 K, 2²⁰ = 1 M, 2³⁰ = 1 B, 2³² = 4 B, 2⁴⁰ = 1 T. Everything else follows. The pattern: every 10 extra bits is roughly three orders of magnitude, because 1024 ≈ 1000.

Power Exact Approximation

2¹⁰ 1,024 1 K (one thousand)

2¹⁶ 65,536 65 K

2²⁰ 1,048,576 1 M (one million)

2³⁰ 1,073,741,824 1 B (one billion)

2³² 4,294,967,296 4 B (signed int range)

2⁴⁰ 1,099,511,627,776 1 T (one trillion)

2⁵⁰ 1,125,899,906,842,624 1 P (one quadrillion)

Practical use: a 64-bit signed ID holds 2⁶³ values — 9 quintillion, effectively infinite for any system you'll ever ship. A 32-bit user ID holds 2³¹, about 2 billion — which is the ceiling Facebook hit. When you're picking between int and bigint for a primary key, this table tells you whether you'll regret it in five years.

Latency numbers, with proportional bars

Jeff Dean's "numbers every programmer should know," redrawn so you can see the orders-of-magnitude gaps. The bars are on a log scale — each tick is 10×. Memorise the gaps, not the exact numbers.

L1 cache reference

0.5 ns

CPU on-die SRAM. Single cycle on most modern chips.

Branch mispredict penalty

5 ns

~5 ns on Skylake-class. Why hot loops care about predictability.

L2 cache reference

7 ns

Slightly off-core; still fast.

Mutex lock/unlock

25 ns

Uncontended; under contention it goes to syscalls and microseconds.

Main memory reference

100 ns

~200x L1. The whole point of caches.

Compress 1 KB with zippy

3 µs

Roughly. Zippy ~ Snappy ~ LZ4 family.

Send 1 KB over 1 Gbps network

10 µs

Same datacentre, intra-rack.

Read 4 KB from SSD

150 µs

NVMe is faster (~30 µs); SATA SSD ~150 µs. HDD is 100x worse.

Round-trip same datacentre

500 µs

Includes network stack overhead, not just wire time.

Read 1 MB sequential from RAM

1 ms

Memory bandwidth dominates over latency for sequential.

Read 1 MB sequential from SSD

1.5 ms

~700 MB/s on a good SSD.

Disk seek (spinning HDD)

10 ms

The thing SSDs killed. Still common in cheap storage tiers.

Read 1 MB sequential from HDD

20 ms

HDDs are still fine for sequential — terrible for random.

Round-trip US east → US west

75 ms

Light in fibre takes ~21 ms; the rest is router hops.

Round-trip US east → Europe

120 ms

Atlantic submarine cable + landside.

Round-trip US east → Asia

220 ms

Pacific submarine cable.

Four factor-of-1000 gaps worth memorising. RAM is ~1000× L1. SSD is ~1000× RAM. Same-DC round-trip is ~5× SSD. Cross-continental round-trip is ~150× same-DC. Any decision that moves a workload across one of these gaps will cost you in latency or buy you in throughput, usually both.

The five derivations

01 · DAU → average QPS

avgQPS = DAU × actions_per_user_per_day / 86,400

Each daily-active user does some number of actions: page views, messages sent, likes. Multiply, then divide by seconds-in-a-day (86,400, close enough to 100 K to round when you need to). 10M DAU × 5 actions = 50M actions/day ≈ 580 QPS average.
02 · Average → peak QPS

peakQPS = avgQPS × peak_multiplier (typically 3-5x)

Traffic is never uniform. Peak hour of peak day usually lands 3–5× the daily average — more if your product has a strong daily rhythm (commute apps spike at 8am and 6pm). Size compute for peak with headroom. Size storage for average.
03 · QPS → storage growth

daily_storage = writes_per_day × avg_record_size × replication_factor × index_overhead

Average record size has to include the value, index entries, replication overhead (typically 3×), and the metadata dark matter you never quite remember to count. A common mistake: thinking storage is "rows × size" when the real answer is closer to "rows × size × 3 × 1.3" for a replicated system with one secondary index.
04 · QPS → instance count

instances = ceil(peakQPS / sustainable_QPS_per_instance)

Per-instance throughput depends on the workload (CPU-bound or IO-bound), but as a starting point: 100–500 QPS for a typical web service, 1000–2000 for a tight Go service, 10–50 for anything doing real work per request. Round up and add a 30% headroom buffer.
05 · Storage → shard count

shards = ceil(total_storage / storage_per_shard)

A single shard caps around 500 GB to 1 TB before rebalancing turns painful. 50 TB of data needs 50–100 shards. Pick the next power of two above the calculation (32, 64, 128…) because resharding by doubling is much cleaner than by some arbitrary factor.

Try it — the live worksheet

Pick a DAU. The six derivations update live. Every assumption is a slider, so you can defend the input and let the math fall out.

Load a preset

DAU 10.00 M users / day Actions / user / day 5 Peak multiplier ×4 Avg record size (write) 1.5 KB Avg response size 4.0 KB Sustainable QPS / instance 200 Storage per shard cap 500 GB Retention 365 days

Average QPS 578.704

Peak QPS 2.3 K

Daily write volume 71.53 GB

Total storage (365d retention, 3× replication) 76.48 TB

Outbound bandwidth at peak 72.3 Mbps

App instances (peak QPS / per-instance) 12

Shards (total storage / shard cap) 53

Every number is just multiplication and division. The interview skill is naming the inputs explicitly — "5 actions per user per day, peak 4× average" — so the room argues with the number instead of the math.

Three worked examples

Three real product shapes, with the numbers run end to end. Same formulas. Wildly different system shapes once the input assumptions land.

Twitter-shaped — read-heavy social feed

Inputs. 250M DAU. Each user views a timeline 10× per day (2.5B reads); writes a tweet 0.2× per day (50M writes). Read:write ratio ≈ 50:1.
Average QPS. 2.5B reads / 86,400 ≈ 29 K reads/sec. 50M writes / 86,400 ≈ 580 writes/sec.
Peak QPS. Twitter has a 5× peak multiplier (big events, breaking news). 145 K reads/sec peak, 3 K writes/sec peak.
Storage. 280-byte tweet + 200-byte metadata × 50M tweets/day = 24 GB/day. With 3× replication and an index, ~100 GB/day, ~36 TB/year. Modest.
Instance count. Read tier at 1000 QPS/instance: 145 K / 1000 = 145 instances minimum, 200 with headroom. Write tier sharded by user ID: ~20 instances.
Shape conclusion. Read-heavy means caching is the whole game. Memcache layer in front of the timeline; pre-computed timelines for active users; lazy fan-out for celebrities. Sharding is by user ID, not by tweet.

Instagram-shaped — write-heavy photo upload

Inputs. 500M DAU. 2 photo uploads per user per day = 1B uploads/day. Average photo 2 MB processed → 200 KB delivered. Each photo viewed by 20 followers on average.
QPS. Writes: 1B / 86,400 = 11 K writes/sec average, 40 K peak. Reads: 20B views / 86,400 = 230 K reads/sec, 1M peak.
Bandwidth. Photo upload: 11 K × 2 MB = 22 GB/sec ingress. Photo serve: 230 K × 200 KB = 46 GB/sec egress. Bandwidth is the dominant cost — CDN is a hard requirement.
Storage. 1B × 2 MB = 2 PB/day raw. With variants (thumbnails + multiple resolutions), 5 PB/day. 1.8 EB/year at one year of retention.
Shape conclusion. Object store (S3-shaped) for the bytes. Metadata in a sharded RDBMS. Heavy CDN front-end so the 46 GB/sec read traffic doesn't reach origin. Image processing in async workers (resize, compress, thumbnail extract).

Slack-shaped — bidirectional messaging with retention

Inputs. 12M DAU. 50 messages sent + 500 messages received per user per day. Each message averages 100 bytes (text + metadata).
QPS. Sends: 600M / 86,400 = 7 K sends/sec average, 25 K peak. Receives via fan-out: 6B / 86,400 = 70 K deliveries/sec, 250 K peak.
Connections. All 12M users hold a persistent WebSocket while active. Realistic concurrent peak: 4M. At ~20 K connections per WebSocket front, that's 200 front-end servers.
Storage. 600M messages × 100 bytes = 60 GB/day. Replicated and indexed: ~250 GB/day. With infinite retention: 90 TB/year. Very modest for the QPS load.
Shape conclusion. The cost is in fan-out and persistent connections, not in storage. Sharded WebSocket layer, Redis pub/sub for cross-shard fan-out, eventually-consistent reads from a sharded RDBMS for history. The hard part is presence (who's online) and history sync after a reconnect, not the messages themselves.

What candidates get wrong

Skipping the assumptions. "5K QPS" with no stated DAU and actions-per-user can't be checked. Lead with the inputs so the interviewer can argue with a number instead of with your conclusion. The conclusion is the easy part; the inputs are the conversation.
Forgetting replication and indexes. A 10M-row table at 1 KB per row is 10 GB on disk only if there's one copy and zero indexes. With 3× replication and two indexes (each ~30% of base table size) the real number is closer to 50 GB. Multiply by 5, not by 1.
Sizing for average instead of peak. You don't size instance count for the daily average. Size it for the busiest minute of the busiest day, plus 30% slack. That 30% absorbs the variance Little's Law warns you about.
Using "10 ms" as if it means anything. 10 ms could be a local SSD read, a same-DC RPC, or three of them in sequence. The latency-numbers table above is the reference scale — use it.
Skipping bandwidth. Bandwidth quietly kills high-traffic services. A 100 KB response at 50K QPS is 40 Gbps — past the limit of a single NIC, which forces architectural choices (CDN, edge caching, shrinking the response). Run the multiplication.
Designing for ten years from now. 10× current peak is a sensible ceiling. Past that, you'll redesign before you ever get there. Twitter's 2010 architecture wouldn't survive 2026; 2026's architecture would have been over-engineered in 2010. Pick a horizon and live in it.

Little's Law, the escape hatch

When the math gets tangled, Little's Law bails you out. concurrency = throughput × latency. Name any two and the third writes itself. Most of the derivations on this page are Little's Law in disguise.

In practice: "100K QPS, each request takes 50 ms, so I need at least 5,000 concurrent slots." Whether those slots are threads, goroutines, async tasks, database connections, or HTTP/2 streams is the next conversation. The number itself is fixed by the equation.

Related on Semicolony

Latency vs throughput — Little's Law in full, with a calculator.
Performance vs scalability — the per-system version of the same trade-off.
Availability patterns — the math behind nines.
System-design playbook — 19 worked problems applying these numbers in full.
Performance methods (RED, USE) — what to measure once you've sized.
Paper — The Tail at Scale on why P99 matters more than the median.
Jeff Dean's original latency-numbers gist
Simon Eskildsen's napkin-math repo — the canonical practice exercises.

Apply the math — the playbook →

Nineteen worked system-design problems. Every one starts with capacity numbers like these, derives a shape from them, then deepens into the operational story.

Open the playbook

← system design study path → system design roadmap

Found this useful?