Senior+

Study path / 09

System design

System design is the line between mid-level and senior. The interview is not "do you know X" but "can you design a system from scratch, in 45 minutes, that someone with operations experience would be willing to oncall for". This study path covers the twelve mental models that get reused on every design, then walks through a playbook of canonical questions — URL shortener, news feed, chat, rate limiter, notifications — at the depth and rigour a senior+ engineering loop expects.

New System Design Roadmap → 15 stages, ~80 topics, with an interactive architecture diagram. New The five principles → Performance vs scalability, latency vs throughput, availability vs consistency, consistency patterns, availability patterns. Long-form deep dives with interactive calculators. New Napkin math → Powers of two, latency numbers with proportional bars, the five derivations every interview asks, a live worksheet, and three worked examples.

Design playbook

A playbook of canonical designs

Nineteen sub-pages — URL shortener, pastebin, distributed KV, news feed, chat, rate limiter, notifications, then on through ride matching, typeahead, web crawler, object storage, and up to Twitter, Instagram, and Netflix. Each one is a full walkthrough: clarifying questions, capacity math, API and schema, the architecture diagram, the failure modes interviewers love to ask, and the trade-offs that separate a passing design from a great one.

Twelve mental models

The set of intuitions reused on every design. Memorise and rehearse — most this design rounds are won or lost on which of these the candidate reaches for first.

01 · Day-zero
Capacity math is the design

Pick the QPS, the data size, the read/write ratio, and the P99 latency budget before drawing the first box. The architecture falls out of the numbers; designs that skip the math collapse at the first ten-x.
02 · Day-zero
Little's law in your head

Concurrency = arrival rate × latency. 10K RPS at 50 ms = 500 in-flight requests. Every queue, thread pool, and connection pool is sized by this one equation.
03 · Day-zero
Storage decides the shape

OLTP, OLAP, KV, document, time-series, vector, graph. The pick determines the read patterns the design must support — and locks in the rewrite cost when it’s wrong.
04 · Practitioner
CAP is a business decision

Network partitions are physical reality, not a design choice. Whether you sacrifice consistency or availability is a product question — phrase it that way to the room.
05 · Practitioner
Read-heavy versus write-heavy

The first axis after capacity. Caching solves reads; nothing solves writes except sharding. Designs that conflate the two get review-bombed.
06 · Practitioner
Hot keys are the real failure

A 99/1 distribution looks fine on paper and ruins one node in production. Designs that survive the interview talk about hot-key sharding, jitter, and probabilistic counting.
07 · Practitioner
Async needs a queue and a budget

Putting work on a queue is the easy half. The hard half is back-pressure: when the queue grows, what gets dropped, what gets retried, and who pays for the buffer.
08 · Practitioner
Idempotency is an API property

Not a server-side trick. Stripe’s "Idempotency-Key" header design is the canonical reference; design idempotency in at the API layer or you’ll re-discover it in an outage.
09 · Operator
Consistency lives at read-time

Quorum reads, monotonic reads, read-your-writes, bounded staleness. Most "we need consistency" requirements melt down to one of these — pick the cheapest one that satisfies the product.
10 · Operator
Multi-region is a different system

Active-passive vs. active-active vs. anycast-write. RPO and RTO are not free; designs that "just add a region" without addressing write conflicts and replication lag fail at promo time.
11 · Operator
Cost shapes architecture

$/QPS, $/GB-month, $/egress-TB. A correct design that triples the cloud bill is a wrong design. Show the cost line on the whiteboard before the room asks.
12 · Operator
Operability is the third axis

Logs, metrics, traces, SLOs, runbooks, on-call rotation, deploy strategy. A design that nobody can run at 3 a.m. is the system everyone refuses to own.

Capacity math, on a napkin

Three numbers and an equation. Memorise these and the interviewer's next question almost always becomes "and what about the storage cost".

What	How	A worked example
Concurrency	arrival × latency (Little's law)	10K RPS × 50 ms = 500 in-flight
Storage / day	writes × bytes	1B writes × 1 KB = 1 TB / day
Storage / 10y	writes × bytes × 3650	~3.6 PB
Egress	reads × bytes	10K RPS × 4 KB × 86,400 ≈ 3.5 TB / day
Cache RAM	hot-set × bytes / hit-ratio	1M keys × 1 KB / 95% ≈ 1 GB
Servers needed	concurrency / per-server concurrency	500 / 100 = 5 + headroom

The five-number scratch pad: QPS, payload size, read/write ratio, P99 latency, and the data-volume horizon (1 day, 1 year, 10 years).

Interactive · Try the napkin math Drag the sliders, watch the math.

Presets

Write QPS 10.0K

Payload (KB) 1.0 KB

R / W ratio 100:1

P99 (ms) 50 ms

Replication ×3

Hot set % 10%

Horizon 1 yr

Throughput

Write QPS, peak30.0K

Read QPS, peak3.0M

Read concurrency (Little's)150.0K

Write concurrency1.5K

Storage & cache

Raw / day824.0 GB

Raw / horizon293.7 TB

With ×3 replication881.1 TB

Hot-set cache (RAM)29.4 TB

Read egress / day80.5 TB

Formulas. Concurrency = QPS × P99 (Little's law). Storage / day = QPS × 86 400 × payload. Peak factor 3× over avg. Hot-set is the RAM working set you'd need for that read percentage.

Latency budgets to keep in your head

A design's latency budget is its first hard constraint. Each layer eats into it; the exam-room mistake is to add two layers without checking the sum against the SLO.

Layer	Typical	How to shave it
L1 cache	~1 ns	data structure choice
RAM access	~80 ns	locality, NUMA-aware
SSD random read 4 KB	~10 µs (NVMe)	page cache, larger reads
Same-DC RTT	~0.5 ms	collocate, kernel bypass
Same-region RTT	~1–2 ms	regional pinning
Cross-region RTT (US east↔west)	~70 ms	read replicas, anycast
Intercontinental RTT (US↔EU)	~80–100 ms	edge cache, region failover
TLS handshake	~30 ms over 1-RTT, 0-RTT with QUIC	session resumption, QUIC
DB query (indexed)	~1–5 ms	proper index, page cache hot
DB query (full-scan)	~100 ms+	kill the query, add an index

Books, courses, papers, talks

The reading list is short. Designing Data-Intensive Applications is the closest the industry has to a textbook on this material; everything else fills in a gap.

Kleppmann — Designing Data-Intensive Applications. The textbook on the trade-offs that show up in design rounds. Replication, partitioning, transactions, derived data, encoding. Read once, then re-read chapters 5, 6, 7, 9 before any loop.
Xu — System Design Interview, vol I & II. The closest thing to a question bank. Lighter on theory than DDIA but covers the canonical designs end-to-end with diagrams.
Henderson — Building Scalable Web Sites. Older but the chapters on capacity, hardware, and operability still read perfectly. Founders-of-Flickr-era engineering.
Beyer et al — Site Reliability Engineering & The Site Reliability Workbook. Free from Google. The SLO and error-budget chapters are required reading; they shift "operability" from vibe to vocabulary.
Newman — Building Microservices. Useful for the multi-service design questions. Pair the first two chapters with the "Microservices" episode of Software Engineering Radio.
Papers: Dynamo (DeCandia et al, 2007), Bigtable (Chang et al, 2006), GFS (Ghemawat et al, 2003), Spanner (Corbett et al, 2012), Cassandra (Lakshman & Malik, 2010), Kafka (Kreps et al, 2011), CRDTs (Shapiro et al, 2011). Read at least Dynamo and Spanner before your loop.
Talks: "Latency Numbers Every Programmer Should Know" (Norvig); "How to Design a System" (Hello Interview); "Mastering Chaos" (Russell, Netflix). The Norvig table on a flashcard, drilled to recall.
Channels: Hello Interview (interview-shaped walkthroughs of canonical designs); ByteByteGo (visual explainers); Mark Richards's Software Architecture Monday shorts.

Hands-on tools

System design rounds are a whiteboard exercise; the operational follow-ups are not. These are the tools to actually run the design once you've drawn it.

Load generation. k6, vegeta, wrk2 for open-loop load (the only correct kind for tail-latency work). locust if you need a Python use with custom scenarios.
Tracing. OpenTelemetry SDKs in your language, plus a backend — Jaeger, Honeycomb, Tempo, or Datadog APM. Trace one request from edge to database before declaring the design "done".
Profilers. perf on Linux, pprof for Go, async-profiler for the JVM, Pyroscope or Parca for continuous profiling. Production profiles are the proof your design holds up.
Capacity sheets. A Google Sheet with the five napkin numbers (QPS, payload, read/write, P99, horizon) feeding instance counts and storage costs. The cheapest tool on this list and the most-used.
Cloud cost calculators. AWS Pricing Calculator, GCP Pricing Calculator, Azure TCO. A design that triples the bill is a wrong design — own the cost line in the room.
Diagrams. Excalidraw, tldraw, draw.io, Whimsical. The hand-drawn aesthetic of Excalidraw matches the way most interviewers expect a design round to look.

Eight common mistakes

Designing without numbers. "We'll shard the database" with no QPS or data-size estimate is the classic L4 tell. Numbers first, boxes second.
Skipping the read/write ratio. A 100:1 read-heavy design and a 1:100 write-heavy design look identical on the whiteboard and require completely different infrastructure. Ask early.
Treating "we'll add a cache" as a design. Caches add freshness bugs, eviction policies, warm-up problems, and one more thing on call. They're a tool, not a victory lap.
Saying "use Kafka" without a back-pressure plan. Async work needs a queue and an answer to "what happens when the queue is full". The second half is non-negotiable.
Treating CAP as a tech choice. "CP versus AP" is a product question — a payment system answers differently than a feed. Phrase the trade-off in the language of the user.
"It scales horizontally" without showing how. Stateless services scale horizontally for free; everything else needs sharding strategy, replication topology, and a hot-key story.
Ignoring multi-region until the interviewer asks. If the SLO is 99.99%, single-region is impossible; raise it before they do. Anycast, active-passive, and active-active are three different designs.
Burying operability. Logs, metrics, traces, SLOs, and the on-call story land late in candidate designs and early in interviewer scoring rubrics. Move them up.

A 45-minute interview pacing

What separates "passing" from "great" is finishing on time. Use these checkpoints; if you're behind, cut deeper, not wider.

Minute	What you should be doing	What it looks like on the board
0–5	Clarifying questions, requirements	Functional reqs · non-functional · scope
5–10	Capacity math	QPS · payload · storage · P99 · horizon
10–15	API and data model	Endpoints · request/response · schema
15–25	High-level architecture	The boxes-and-arrows diagram
25–35	Deep dive on the hard part	Sharding · consistency · hot keys · async
35–42	Failure modes & operability	What dies · what oncall sees · SLOs
42–45	Trade-offs & what's next	What you'd change at 10× scale

Adjacent paths

Distributed systems. The theoretical backing — replication, consensus, time, partial failure. Read this in parallel; system design without it is just box-drawing.
Computer networking. The wire under every diagram. TCP, TLS, DNS, BGP, QUIC. Most "why is this slow" answers live here.
Databases. Storage choice is half the design. B-trees vs. LSM, OLTP vs. OLAP, the planner, the page cache.
API design. The user-facing surface of the system you're designing. REST, gRPC, idempotency, versioning, pagination.

Continue

Open the design playbook

Nineteen canonical designs, each at interview-grade depth — clarifying questions, capacity math, API, schema, the architecture, the failure modes, the trade-offs.

Open the playbook

System design

Twelve mental models

Capacity math is the design

Little's law in your head

Storage decides the shape

CAP is a business decision

Read-heavy versus write-heavy

Hot keys are the real failure

Async needs a queue and a budget

Idempotency is an API property

Consistency lives at read-time

Multi-region is a different system

Cost shapes architecture

Operability is the third axis