System design
System design is the line between mid-level and senior. The interview is not "do you know X" but "can you design a system from scratch, in 45 minutes, that someone with operations experience would be willing to oncall for". This study path covers the twelve mental models that get reused on every design, then walks through a playbook of canonical questions — URL shortener, news feed, chat, rate limiter, notifications — at the depth and rigour a senior+ engineering loop expects.
New System Design Roadmap → 15 stages, ~80 topics, with an interactive architecture diagram. New The five principles → Performance vs scalability, latency vs throughput, availability vs consistency, consistency patterns, availability patterns. Long-form deep dives with interactive calculators. New Napkin math → Powers of two, latency numbers with proportional bars, the five derivations every interview asks, a live worksheet, and three worked examples.Twelve mental models
The set of intuitions reused on every design. Memorise and rehearse — most this design rounds are won or lost on which of these the candidate reaches for first.
- 01 · Day-zero
Capacity math is the design
Pick the QPS, the data size, the read/write ratio, and the P99 latency budget before drawing the first box. The architecture falls out of the numbers; designs that skip the math collapse at the first ten-x.
- 02 · Day-zero
Little's law in your head
Concurrency = arrival rate × latency. 10K RPS at 50 ms = 500 in-flight requests. Every queue, thread pool, and connection pool is sized by this one equation.
- 03 · Day-zero
Storage decides the shape
OLTP, OLAP, KV, document, time-series, vector, graph. The pick determines the read patterns the design must support — and locks in the rewrite cost when it’s wrong.
- 04 · Practitioner
CAP is a business decision
Network partitions are physical reality, not a design choice. Whether you sacrifice consistency or availability is a product question — phrase it that way to the room.
- 05 · Practitioner
Read-heavy versus write-heavy
The first axis after capacity. Caching solves reads; nothing solves writes except sharding. Designs that conflate the two get review-bombed.
- 06 · Practitioner
Hot keys are the real failure
A 99/1 distribution looks fine on paper and ruins one node in production. Designs that survive the interview talk about hot-key sharding, jitter, and probabilistic counting.
- 07 · Practitioner
Async needs a queue and a budget
Putting work on a queue is the easy half. The hard half is back-pressure: when the queue grows, what gets dropped, what gets retried, and who pays for the buffer.
- 08 · Practitioner
Idempotency is an API property
Not a server-side trick. Stripe’s "Idempotency-Key" header design is the canonical reference; design idempotency in at the API layer or you’ll re-discover it in an outage.
- 09 · Operator
Consistency lives at read-time
Quorum reads, monotonic reads, read-your-writes, bounded staleness. Most "we need consistency" requirements melt down to one of these — pick the cheapest one that satisfies the product.
- 10 · Operator
Multi-region is a different system
Active-passive vs. active-active vs. anycast-write. RPO and RTO are not free; designs that "just add a region" without addressing write conflicts and replication lag fail at promo time.
- 11 · Operator
Cost shapes architecture
$/QPS, $/GB-month, $/egress-TB. A correct design that triples the cloud bill is a wrong design. Show the cost line on the whiteboard before the room asks.
- 12 · Operator
Operability is the third axis
Logs, metrics, traces, SLOs, runbooks, on-call rotation, deploy strategy. A design that nobody can run at 3 a.m. is the system everyone refuses to own.
Capacity math, on a napkin
Three numbers and an equation. Memorise these and the interviewer's next question almost always becomes "and what about the storage cost".
| What | How | A worked example |
|---|---|---|
| Concurrency | arrival × latency (Little's law) | 10K RPS × 50 ms = 500 in-flight |
| Storage / day | writes × bytes | 1B writes × 1 KB = 1 TB / day |
| Storage / 10y | writes × bytes × 3650 | ~3.6 PB |
| Egress | reads × bytes | 10K RPS × 4 KB × 86,400 ≈ 3.5 TB / day |
| Cache RAM | hot-set × bytes / hit-ratio | 1M keys × 1 KB / 95% ≈ 1 GB |
| Servers needed | concurrency / per-server concurrency | 500 / 100 = 5 + headroom |
The five-number scratch pad: QPS, payload size, read/write ratio, P99 latency, and the data-volume horizon (1 day, 1 year, 10 years).
Latency budgets to keep in your head
A design's latency budget is its first hard constraint. Each layer eats into it; the exam-room mistake is to add two layers without checking the sum against the SLO.
| Layer | Typical | How to shave it |
|---|---|---|
| L1 cache | ~1 ns | data structure choice |
| RAM access | ~80 ns | locality, NUMA-aware |
| SSD random read 4 KB | ~10 µs (NVMe) | page cache, larger reads |
| Same-DC RTT | ~0.5 ms | collocate, kernel bypass |
| Same-region RTT | ~1–2 ms | regional pinning |
| Cross-region RTT (US east↔west) | ~70 ms | read replicas, anycast |
| Intercontinental RTT (US↔EU) | ~80–100 ms | edge cache, region failover |
| TLS handshake | ~30 ms over 1-RTT, 0-RTT with QUIC | session resumption, QUIC |
| DB query (indexed) | ~1–5 ms | proper index, page cache hot |
| DB query (full-scan) | ~100 ms+ | kill the query, add an index |
Books, courses, papers, talks
The reading list is short. Designing Data-Intensive Applications is the closest the industry has to a textbook on this material; everything else fills in a gap.
- Kleppmann — Designing Data-Intensive Applications. The textbook on the trade-offs that show up in design rounds. Replication, partitioning, transactions, derived data, encoding. Read once, then re-read chapters 5, 6, 7, 9 before any loop.
- Xu — System Design Interview, vol I & II. The closest thing to a question bank. Lighter on theory than DDIA but covers the canonical designs end-to-end with diagrams.
- Henderson — Building Scalable Web Sites. Older but the chapters on capacity, hardware, and operability still read perfectly. Founders-of-Flickr-era engineering.
- Beyer et al — Site Reliability Engineering & The Site Reliability Workbook. Free from Google. The SLO and error-budget chapters are required reading; they shift "operability" from vibe to vocabulary.
- Newman — Building Microservices. Useful for the multi-service design questions. Pair the first two chapters with the "Microservices" episode of Software Engineering Radio.
- Papers: Dynamo (DeCandia et al, 2007), Bigtable (Chang et al, 2006), GFS (Ghemawat et al, 2003), Spanner (Corbett et al, 2012), Cassandra (Lakshman & Malik, 2010), Kafka (Kreps et al, 2011), CRDTs (Shapiro et al, 2011). Read at least Dynamo and Spanner before your loop.
- Talks: "Latency Numbers Every Programmer Should Know" (Norvig); "How to Design a System" (Hello Interview); "Mastering Chaos" (Russell, Netflix). The Norvig table on a flashcard, drilled to recall.
- Channels: Hello Interview (interview-shaped walkthroughs of canonical designs); ByteByteGo (visual explainers); Mark Richards's Software Architecture Monday shorts.
Hands-on tools
System design rounds are a whiteboard exercise; the operational follow-ups are not. These are the tools to actually run the design once you've drawn it.
- Load generation.
k6,vegeta,wrk2for open-loop load (the only correct kind for tail-latency work).locustif you need a Python use with custom scenarios. - Tracing. OpenTelemetry SDKs in your language, plus a backend — Jaeger, Honeycomb, Tempo, or Datadog APM. Trace one request from edge to database before declaring the design "done".
- Profilers.
perfon Linux,pproffor Go,async-profilerfor the JVM, Pyroscope or Parca for continuous profiling. Production profiles are the proof your design holds up. - Capacity sheets. A Google Sheet with the five napkin numbers (QPS, payload, read/write, P99, horizon) feeding instance counts and storage costs. The cheapest tool on this list and the most-used.
- Cloud cost calculators. AWS Pricing Calculator, GCP Pricing Calculator, Azure TCO. A design that triples the bill is a wrong design — own the cost line in the room.
- Diagrams. Excalidraw, tldraw, draw.io, Whimsical. The hand-drawn aesthetic of Excalidraw matches the way most interviewers expect a design round to look.
Eight common mistakes
- Designing without numbers. "We'll shard the database" with no QPS or data-size estimate is the classic L4 tell. Numbers first, boxes second.
- Skipping the read/write ratio. A 100:1 read-heavy design and a 1:100 write-heavy design look identical on the whiteboard and require completely different infrastructure. Ask early.
- Treating "we'll add a cache" as a design. Caches add freshness bugs, eviction policies, warm-up problems, and one more thing on call. They're a tool, not a victory lap.
- Saying "use Kafka" without a back-pressure plan. Async work needs a queue and an answer to "what happens when the queue is full". The second half is non-negotiable.
- Treating CAP as a tech choice. "CP versus AP" is a product question — a payment system answers differently than a feed. Phrase the trade-off in the language of the user.
- "It scales horizontally" without showing how. Stateless services scale horizontally for free; everything else needs sharding strategy, replication topology, and a hot-key story.
- Ignoring multi-region until the interviewer asks. If the SLO is 99.99%, single-region is impossible; raise it before they do. Anycast, active-passive, and active-active are three different designs.
- Burying operability. Logs, metrics, traces, SLOs, and the on-call story land late in candidate designs and early in interviewer scoring rubrics. Move them up.
A 45-minute interview pacing
What separates "passing" from "great" is finishing on time. Use these checkpoints; if you're behind, cut deeper, not wider.
| Minute | What you should be doing | What it looks like on the board |
|---|---|---|
| 0–5 | Clarifying questions, requirements | Functional reqs · non-functional · scope |
| 5–10 | Capacity math | QPS · payload · storage · P99 · horizon |
| 10–15 | API and data model | Endpoints · request/response · schema |
| 15–25 | High-level architecture | The boxes-and-arrows diagram |
| 25–35 | Deep dive on the hard part | Sharding · consistency · hot keys · async |
| 35–42 | Failure modes & operability | What dies · what oncall sees · SLOs |
| 42–45 | Trade-offs & what's next | What you'd change at 10× scale |
Adjacent paths
- Distributed systems. The theoretical backing — replication, consensus, time, partial failure. Read this in parallel; system design without it is just box-drawing.
- Computer networking. The wire under every diagram. TCP, TLS, DNS, BGP, QUIC. Most "why is this slow" answers live here.
- Databases. Storage choice is half the design. B-trees vs. LSM, OLTP vs. OLAP, the planner, the page cache.
- API design. The user-facing surface of the system you're designing. REST, gRPC, idempotency, versioning, pagination.