Senior+
Study path / 09

System design

System design is the line between mid-level and senior. The interview is not "do you know X" but "can you design a system from scratch, in 45 minutes, that someone with operations experience would be willing to oncall for". This study path covers the twelve mental models that get reused on every design, then walks through a playbook of canonical questions — URL shortener, news feed, chat, rate limiter, notifications — at the depth and rigour a senior+ engineering loop expects.

New System Design Roadmap → 15 stages, ~80 topics, with an interactive architecture diagram. New The five principles → Performance vs scalability, latency vs throughput, availability vs consistency, consistency patterns, availability patterns. Long-form deep dives with interactive calculators. New Napkin math → Powers of two, latency numbers with proportional bars, the five derivations every interview asks, a live worksheet, and three worked examples.

Twelve mental models

The set of intuitions reused on every design. Memorise and rehearse — most this design rounds are won or lost on which of these the candidate reaches for first.

  1. 01 · Day-zero

    Capacity math is the design

    Pick the QPS, the data size, the read/write ratio, and the P99 latency budget before drawing the first box. The architecture falls out of the numbers; designs that skip the math collapse at the first ten-x.

  2. 02 · Day-zero

    Little's law in your head

    Concurrency = arrival rate × latency. 10K RPS at 50 ms = 500 in-flight requests. Every queue, thread pool, and connection pool is sized by this one equation.

  3. 03 · Day-zero

    Storage decides the shape

    OLTP, OLAP, KV, document, time-series, vector, graph. The pick determines the read patterns the design must support — and locks in the rewrite cost when it’s wrong.

  4. 04 · Practitioner

    CAP is a business decision

    Network partitions are physical reality, not a design choice. Whether you sacrifice consistency or availability is a product question — phrase it that way to the room.

  5. 05 · Practitioner

    Read-heavy versus write-heavy

    The first axis after capacity. Caching solves reads; nothing solves writes except sharding. Designs that conflate the two get review-bombed.

  6. 06 · Practitioner

    Hot keys are the real failure

    A 99/1 distribution looks fine on paper and ruins one node in production. Designs that survive the interview talk about hot-key sharding, jitter, and probabilistic counting.

  7. 07 · Practitioner

    Async needs a queue and a budget

    Putting work on a queue is the easy half. The hard half is back-pressure: when the queue grows, what gets dropped, what gets retried, and who pays for the buffer.

  8. 08 · Practitioner

    Idempotency is an API property

    Not a server-side trick. Stripe’s "Idempotency-Key" header design is the canonical reference; design idempotency in at the API layer or you’ll re-discover it in an outage.

  9. 09 · Operator

    Consistency lives at read-time

    Quorum reads, monotonic reads, read-your-writes, bounded staleness. Most "we need consistency" requirements melt down to one of these — pick the cheapest one that satisfies the product.

  10. 10 · Operator

    Multi-region is a different system

    Active-passive vs. active-active vs. anycast-write. RPO and RTO are not free; designs that "just add a region" without addressing write conflicts and replication lag fail at promo time.

  11. 11 · Operator

    Cost shapes architecture

    $/QPS, $/GB-month, $/egress-TB. A correct design that triples the cloud bill is a wrong design. Show the cost line on the whiteboard before the room asks.

  12. 12 · Operator

    Operability is the third axis

    Logs, metrics, traces, SLOs, runbooks, on-call rotation, deploy strategy. A design that nobody can run at 3 a.m. is the system everyone refuses to own.

Capacity math, on a napkin

Three numbers and an equation. Memorise these and the interviewer's next question almost always becomes "and what about the storage cost".

WhatHowA worked example
Concurrencyarrival × latency (Little's law)10K RPS × 50 ms = 500 in-flight
Storage / daywrites × bytes1B writes × 1 KB = 1 TB / day
Storage / 10ywrites × bytes × 3650~3.6 PB
Egressreads × bytes10K RPS × 4 KB × 86,400 ≈ 3.5 TB / day
Cache RAMhot-set × bytes / hit-ratio1M keys × 1 KB / 95% ≈ 1 GB
Servers neededconcurrency / per-server concurrency500 / 100 = 5 + headroom

The five-number scratch pad: QPS, payload size, read/write ratio, P99 latency, and the data-volume horizon (1 day, 1 year, 10 years).

Interactive · Try the napkin math Drag the sliders, watch the math.
Presets
10.0K
1.0 KB
100:1
50 ms
×3
10%
1 yr
Throughput
Write QPS, peak30.0K
Read QPS, peak3.0M
Read concurrency (Little's)150.0K
Write concurrency1.5K
Storage & cache
Raw / day824.0 GB
Raw / horizon293.7 TB
With ×3 replication881.1 TB
Hot-set cache (RAM)29.4 TB
Read egress / day80.5 TB
Formulas. Concurrency = QPS × P99 (Little's law). Storage / day = QPS × 86 400 × payload. Peak factor 3× over avg. Hot-set is the RAM working set you'd need for that read percentage.

Latency budgets to keep in your head

A design's latency budget is its first hard constraint. Each layer eats into it; the exam-room mistake is to add two layers without checking the sum against the SLO.

LayerTypicalHow to shave it
L1 cache~1 nsdata structure choice
RAM access~80 nslocality, NUMA-aware
SSD random read 4 KB~10 µs (NVMe)page cache, larger reads
Same-DC RTT~0.5 mscollocate, kernel bypass
Same-region RTT~1–2 msregional pinning
Cross-region RTT (US east↔west)~70 msread replicas, anycast
Intercontinental RTT (US↔EU)~80–100 msedge cache, region failover
TLS handshake~30 ms over 1-RTT, 0-RTT with QUICsession resumption, QUIC
DB query (indexed)~1–5 msproper index, page cache hot
DB query (full-scan)~100 ms+kill the query, add an index

Books, courses, papers, talks

The reading list is short. Designing Data-Intensive Applications is the closest the industry has to a textbook on this material; everything else fills in a gap.

  • Kleppmann — Designing Data-Intensive Applications. The textbook on the trade-offs that show up in design rounds. Replication, partitioning, transactions, derived data, encoding. Read once, then re-read chapters 5, 6, 7, 9 before any loop.
  • Xu — System Design Interview, vol I & II. The closest thing to a question bank. Lighter on theory than DDIA but covers the canonical designs end-to-end with diagrams.
  • Henderson — Building Scalable Web Sites. Older but the chapters on capacity, hardware, and operability still read perfectly. Founders-of-Flickr-era engineering.
  • Beyer et al — Site Reliability Engineering & The Site Reliability Workbook. Free from Google. The SLO and error-budget chapters are required reading; they shift "operability" from vibe to vocabulary.
  • Newman — Building Microservices. Useful for the multi-service design questions. Pair the first two chapters with the "Microservices" episode of Software Engineering Radio.
  • Papers: Dynamo (DeCandia et al, 2007), Bigtable (Chang et al, 2006), GFS (Ghemawat et al, 2003), Spanner (Corbett et al, 2012), Cassandra (Lakshman & Malik, 2010), Kafka (Kreps et al, 2011), CRDTs (Shapiro et al, 2011). Read at least Dynamo and Spanner before your loop.
  • Talks: "Latency Numbers Every Programmer Should Know" (Norvig); "How to Design a System" (Hello Interview); "Mastering Chaos" (Russell, Netflix). The Norvig table on a flashcard, drilled to recall.
  • Channels: Hello Interview (interview-shaped walkthroughs of canonical designs); ByteByteGo (visual explainers); Mark Richards's Software Architecture Monday shorts.

Hands-on tools

System design rounds are a whiteboard exercise; the operational follow-ups are not. These are the tools to actually run the design once you've drawn it.

  • Load generation. k6, vegeta, wrk2 for open-loop load (the only correct kind for tail-latency work). locust if you need a Python use with custom scenarios.
  • Tracing. OpenTelemetry SDKs in your language, plus a backend — Jaeger, Honeycomb, Tempo, or Datadog APM. Trace one request from edge to database before declaring the design "done".
  • Profilers. perf on Linux, pprof for Go, async-profiler for the JVM, Pyroscope or Parca for continuous profiling. Production profiles are the proof your design holds up.
  • Capacity sheets. A Google Sheet with the five napkin numbers (QPS, payload, read/write, P99, horizon) feeding instance counts and storage costs. The cheapest tool on this list and the most-used.
  • Cloud cost calculators. AWS Pricing Calculator, GCP Pricing Calculator, Azure TCO. A design that triples the bill is a wrong design — own the cost line in the room.
  • Diagrams. Excalidraw, tldraw, draw.io, Whimsical. The hand-drawn aesthetic of Excalidraw matches the way most interviewers expect a design round to look.

Eight common mistakes

  • Designing without numbers. "We'll shard the database" with no QPS or data-size estimate is the classic L4 tell. Numbers first, boxes second.
  • Skipping the read/write ratio. A 100:1 read-heavy design and a 1:100 write-heavy design look identical on the whiteboard and require completely different infrastructure. Ask early.
  • Treating "we'll add a cache" as a design. Caches add freshness bugs, eviction policies, warm-up problems, and one more thing on call. They're a tool, not a victory lap.
  • Saying "use Kafka" without a back-pressure plan. Async work needs a queue and an answer to "what happens when the queue is full". The second half is non-negotiable.
  • Treating CAP as a tech choice. "CP versus AP" is a product question — a payment system answers differently than a feed. Phrase the trade-off in the language of the user.
  • "It scales horizontally" without showing how. Stateless services scale horizontally for free; everything else needs sharding strategy, replication topology, and a hot-key story.
  • Ignoring multi-region until the interviewer asks. If the SLO is 99.99%, single-region is impossible; raise it before they do. Anycast, active-passive, and active-active are three different designs.
  • Burying operability. Logs, metrics, traces, SLOs, and the on-call story land late in candidate designs and early in interviewer scoring rubrics. Move them up.

A 45-minute interview pacing

What separates "passing" from "great" is finishing on time. Use these checkpoints; if you're behind, cut deeper, not wider.

MinuteWhat you should be doingWhat it looks like on the board
0–5Clarifying questions, requirementsFunctional reqs · non-functional · scope
5–10Capacity mathQPS · payload · storage · P99 · horizon
10–15API and data modelEndpoints · request/response · schema
15–25High-level architectureThe boxes-and-arrows diagram
25–35Deep dive on the hard partSharding · consistency · hot keys · async
35–42Failure modes & operabilityWhat dies · what oncall sees · SLOs
42–45Trade-offs & what's nextWhat you'd change at 10× scale

Adjacent paths

  • Distributed systems. The theoretical backing — replication, consensus, time, partial failure. Read this in parallel; system design without it is just box-drawing.
  • Computer networking. The wire under every diagram. TCP, TLS, DNS, BGP, QUIC. Most "why is this slow" answers live here.
  • Databases. Storage choice is half the design. B-trees vs. LSM, OLTP vs. OLAP, the planner, the page cache.
  • API design. The user-facing surface of the system you're designing. REST, gRPC, idempotency, versioning, pagination.