The design framework — System Design Handbook

A blank whiteboard is not a problem. It is six problems, in order. The framework is the order.

Knowing the components isn't the hard part. Every candidate who's read a system design book knows what Cassandra is. The hard part is knowing which component to reach for, in what order, and what trade-offs that choice forces on everything downstream. This framework is a structured way through a 45-minute interview — not a checklist to follow robotically, but a forcing function that keeps you moving through the right questions even when you don't know the answer to every one of them.

Six steps in order. The first five are roughly equal in time. The sixth is whatever's left, and it's where every signal an interviewer cares about gets generated.

Scope — five minutes that save you twenty

The most common failure mode in a design interview is building the right system for the wrong problem. You spend 40 minutes on an elegant distributed database, then the interviewer says "I actually meant the mobile-only version with offline-first sync" and the session is lost.

Spend five minutes (not ten) on four things. First: functional requirements, meaning the specific user-visible actions the system performs. "Design a URL shortener" is insufficient; "users create short URLs that redirect, with click analytics, a 30-day default expiry, and optional custom aliases" is what you need. Second: non-functionals — latency target (P50/P99), availability (99.9% vs 99.99% is a real difference), durability, consistency model, single-region or global. Third: what's explicitly out of scope — say it out loud. "Not designing the analytics dashboard, not building the billing system, not handling video encoding." This buys you time and signals discipline. Fourth: the read/write ratio. One number that answers a third of your design. URL shortener is 1000:1 read-heavy. Twitter's feed is 100:1. Stripe payments are close to 1:1.

Estimates — numbers before diagrams

Estimates before diagrams — always. The numbers you put on the side of the board get referenced for the rest of the session. "We said 700k QPS, so a relational primary won't survive it" is a much more authoritative statement than "Cassandra seems right here." The estimates give you the use to make opinionated decisions.

Quantity	How to estimate	Why it matters
DAU	Stated, or a fraction of MAU (typically 20-30%).	Sets the scale of everything downstream.
Read QPS	DAU × reads-per-user-per-day ÷ 86,400.	Decides if you need caching, replicas, or CDN.
Write QPS	Same, with writes. Often 1-2 orders of magnitude lower.	Decides if you need queues, sharding, or async processing.
Storage / day	Writes × payload size.	Decides if it fits in one DB or needs blob storage.
Storage / 5 yr	Daily × 365 × 5, with growth multiplier.	Decides sharding strategy and retention policy.
Egress bandwidth	Read QPS × payload size.	Decides CDN vs origin and rough cloud bill.

Numbers worth keeping in your head: tweet body ≈ 200 B, photo thumbnail ≈ 200 kB, short video ≈ 5 MB, JSON event ≈ 1 kB. Latency constants: same-AZ round trip ≈ 1 ms, same-region ≈ 5 ms, cross-region ≈ 80 ms, transatlantic ≈ 130 ms. Having these ready means you can do the mental arithmetic during the interview rather than stalling to think.

API contracts — force the hard questions early

Write three or four endpoints — that's enough to pin down the system boundary. The shape of the API drives the data model and forces several decisions that would otherwise stay vague until they cause problems.

POST   /shorten
       body: { url: string, custom_alias?: string, ttl_days?: number }
       returns: { short_code: string, expires_at: ISO8601 }

GET    /:code
       returns: 302 → original_url
                404 if expired or unknown

GET    /:code/stats
       returns: { clicks: int, last_click_at: ISO8601, top_referrers: [...] }

Three decisions to make explicit while writing the API: protocol (REST for public resource-oriented APIs, gRPC for low-latency internal services, WebSocket for bidirectional), idempotency (every write needs an idempotency key — state it here before the interviewer asks what happens on retry), and pagination (cursor for feeds and infinite scroll, offset for jump-to-page — cursor is almost always correct). These small decisions signal production experience and prevent the interviewer from having to drag them out of you later.

Data model — the decision that constrains everything else

Storage is the decision that's hardest to undo. Get the wrong database engine and you're looking at a multi-quarter migration. The cards below aren't a complete taxonomy — they're the decision tree most interview discussions need.

SQL

Default. Joins, transactions, mature operators. Pick when relationships matter, when consistency matters, when the team already knows it. PostgreSQL, MySQL.

Document

Pick when the schema is wide and shallow per object, denormalised reads dominate, and you'll never join. MongoDB, DynamoDB document mode.

Key-value

Pick when access is purely "give me the value for this key" and you need scale. DynamoDB, Cassandra, Redis as a primary store.

Wide-column

Pick for time-series, fan-out feeds, append-mostly heavy writes with predictable read paths. Cassandra, ScyllaDB, Bigtable.

Graph

Pick when relationships are the query — friend-of-friend, fraud rings, knowledge graphs. Neo4j, Amazon Neptune.

Search index

Pick when full-text or faceted search is core to the product. Elasticsearch, OpenSearch, usually as a secondary store fed by CDC.

Sketch the schema: primary key, one or two indexes, an example row. Write the partition key separately — it's how the data shards, and it's the constraint that limits future scale. Ask yourself: "what's the most common read pattern?" The partition key should match it. For the URL shortener: (short_code PK, original_url, created_at, expires_at) partitioned by hash(short_code).

High-level design — five to seven boxes

Draw the boxes before you draw the arrows. Five to seven boxes is the right density — fewer is too vague to probe, more becomes hard to follow. Cover the request path end to end: client, DNS, CDN, load balancer, API gateway, service, cache, database, queue, workers. The goal is a diagram the interviewer can ask "what happens when this fails?" about.

Edge: DNS for routing, CDN for static assets and cacheable responses, anycast for global traffic shaping.
Ingress: L7 load balancer terminates TLS, handles auth, dispatches to services. WAF lives here for public-facing systems.
Compute: Stateless service containers behind the LB. Auto-scaling on CPU + RPS. Sized so any one can die without notice.
State: Primary database (with replicas), distributed cache (Redis), blob store (S3) for big objects, search index for query workloads not suited to the primary store.
Async path: Queue (SQS, Kafka) for fan-out work, workers consuming the queue, dead-letter queue for failures.
Observability: Metrics (Prometheus / Cloud Monitoring), logs (centralised), distributed traces (OpenTelemetry). State this last but never skip it.

The deep dive — where the real signal is

The interviewer will pick a box and ask one of three questions: "scale this," "what happens when this fails?", or "what changes if QPS goes 100×?" These aren't trick questions — they're the evaluation. The deep dive is where candidates separate, and it runs until time is up.

Bottlenecks first. Walk the diagram and name the bottleneck at each box. The DB is usually first; the LB is rarely a real bottleneck; the queue might be. State each one and what you'd do about it (cache, shard, read replicas, partition the queue, async-fy the path).

Failure modes second. For each component: what happens when it goes down or slows down? For the cache, that's stampede + cold start. For the DB, that's failover lag. For the queue, that's backlog and DLQ overflow. State the mitigation for each.

Trade-offs third. CAP picks (CP for payments, AP for social), consistency model (strong vs eventual vs causal), durability vs latency (sync replication vs async), single-region vs multi-region. State the choice and why.

The trade-off cheatsheet

You want	You lose
Strong consistency	Latency on the slow path; availability during partitions
Low latency	Either consistency or freshness — you cache something stale
High availability	Strong consistency; usually some operational complexity
Cheap reads	Expensive writes — every cache, every replica is paid for at write time
Cheap writes	Expensive reads — denormalised, fan-out, scatter-gather
Multi-region	Cross-region latency on writes; double the cost; conflict handling

What to do when you get stuck

Every candidate gets stuck. The ones who recover well have a default move:

Start with scope, every time. Five minutes, not ten. The interviewer's patience is finite and scope is where you build the vocabulary for the rest of the session.
Estimates before diagrams. The numbers in the margin get referenced constantly. They're your authority to make opinionated choices.
Talk while drawing. Silence reads as confusion even when it isn't. Narrate what you're drawing and why.
End with "what I'd improve." Always, without exception. Every real production system is unfinished, and the interviewer knows it.
If stuck, walk the request path. Trace the request from client to database and back. The bottleneck is almost always between two components you've already drawn.

The design framework.

Scope — five minutes that save you twenty

Estimates — numbers before diagrams

API contracts — force the hard questions early

Data model — the decision that constrains everything else

High-level design — five to seven boxes

The deep dive — where the real signal is

The trade-off cheatsheet

What to do when you get stuck