A blank whiteboard is not a problem. It is six problems, in order. The framework is the order.
Knowing the components isn't the hard part. Every candidate who's read a system design book knows what Cassandra is. The hard part is knowing which component to reach for, in what order, and what trade-offs that choice forces on everything downstream. This framework is a structured way through a 45-minute interview — not a checklist to follow robotically, but a forcing function that keeps you moving through the right questions even when you don't know the answer to every one of them.
Scope — five minutes that save you twenty
The most common failure mode in a design interview is building the right system for the wrong problem. You spend 40 minutes on an elegant distributed database, then the interviewer says "I actually meant the mobile-only version with offline-first sync" and the session is lost.
Spend five minutes (not ten) on four things. First: functional requirements, meaning the specific user-visible actions the system performs. "Design a URL shortener" is insufficient; "users create short URLs that redirect, with click analytics, a 30-day default expiry, and optional custom aliases" is what you need. Second: non-functionals — latency target (P50/P99), availability (99.9% vs 99.99% is a real difference), durability, consistency model, single-region or global. Third: what's explicitly out of scope — say it out loud. "Not designing the analytics dashboard, not building the billing system, not handling video encoding." This buys you time and signals discipline. Fourth: the read/write ratio. One number that answers a third of your design. URL shortener is 1000:1 read-heavy. Twitter's feed is 100:1. Stripe payments are close to 1:1.
Estimates — numbers before diagrams
Estimates before diagrams — always. The numbers you put on the side of the board get referenced for the rest of the session. "We said 700k QPS, so a relational primary won't survive it" is a much more authoritative statement than "Cassandra seems right here." The estimates give you the use to make opinionated decisions.
| Quantity | How to estimate | Why it matters |
|---|---|---|
| DAU | Stated, or a fraction of MAU (typically 20-30%). | Sets the scale of everything downstream. |
| Read QPS | DAU × reads-per-user-per-day ÷ 86,400. | Decides if you need caching, replicas, or CDN. |
| Write QPS | Same, with writes. Often 1-2 orders of magnitude lower. | Decides if you need queues, sharding, or async processing. |
| Storage / day | Writes × payload size. | Decides if it fits in one DB or needs blob storage. |
| Storage / 5 yr | Daily × 365 × 5, with growth multiplier. | Decides sharding strategy and retention policy. |
| Egress bandwidth | Read QPS × payload size. | Decides CDN vs origin and rough cloud bill. |
Numbers worth keeping in your head: tweet body ≈ 200 B, photo thumbnail ≈ 200 kB, short video ≈ 5 MB, JSON event ≈ 1 kB. Latency constants: same-AZ round trip ≈ 1 ms, same-region ≈ 5 ms, cross-region ≈ 80 ms, transatlantic ≈ 130 ms. Having these ready means you can do the mental arithmetic during the interview rather than stalling to think.
API contracts — force the hard questions early
Write three or four endpoints — that's enough to pin down the system boundary. The shape of the API drives the data model and forces several decisions that would otherwise stay vague until they cause problems.
POST /shorten
body: { url: string, custom_alias?: string, ttl_days?: number }
returns: { short_code: string, expires_at: ISO8601 }
GET /:code
returns: 302 → original_url
404 if expired or unknown
GET /:code/stats
returns: { clicks: int, last_click_at: ISO8601, top_referrers: [...] }
Three decisions to make explicit while writing the API: protocol (REST for public resource-oriented APIs, gRPC for low-latency internal services, WebSocket for bidirectional), idempotency (every write needs an idempotency key — state it here before the interviewer asks what happens on retry), and pagination (cursor for feeds and infinite scroll, offset for jump-to-page — cursor is almost always correct). These small decisions signal production experience and prevent the interviewer from having to drag them out of you later.
Data model — the decision that constrains everything else
Storage is the decision that's hardest to undo. Get the wrong database engine and you're looking at a multi-quarter migration. The cards below aren't a complete taxonomy — they're the decision tree most interview discussions need.
Default. Joins, transactions, mature operators. Pick when relationships matter, when consistency matters, when the team already knows it. PostgreSQL, MySQL.
Pick when the schema is wide and shallow per object, denormalised reads dominate, and you'll never join. MongoDB, DynamoDB document mode.
Pick when access is purely "give me the value for this key" and you need scale. DynamoDB, Cassandra, Redis as a primary store.
Pick for time-series, fan-out feeds, append-mostly heavy writes with predictable read paths. Cassandra, ScyllaDB, Bigtable.
Pick when relationships are the query — friend-of-friend, fraud rings, knowledge graphs. Neo4j, Amazon Neptune.
Pick when full-text or faceted search is core to the product. Elasticsearch, OpenSearch, usually as a secondary store fed by CDC.
Sketch the schema: primary key, one or two indexes, an example row. Write the partition key separately — it's how the data shards, and it's the constraint that limits future scale. Ask yourself: "what's the most common read pattern?" The partition key should match it. For the URL shortener: (short_code PK, original_url, created_at, expires_at) partitioned by hash(short_code).
High-level design — five to seven boxes
Draw the boxes before you draw the arrows. Five to seven boxes is the right density — fewer is too vague to probe, more becomes hard to follow. Cover the request path end to end: client, DNS, CDN, load balancer, API gateway, service, cache, database, queue, workers. The goal is a diagram the interviewer can ask "what happens when this fails?" about.
- Edge
- DNS for routing, CDN for static assets and cacheable responses, anycast for global traffic shaping.
- Ingress
- L7 load balancer terminates TLS, handles auth, dispatches to services. WAF lives here for public-facing systems.
- Compute
- Stateless service containers behind the LB. Auto-scaling on CPU + RPS. Sized so any one can die without notice.
- State
- Primary database (with replicas), distributed cache (Redis), blob store (S3) for big objects, search index for query workloads not suited to the primary store.
- Async path
- Queue (SQS, Kafka) for fan-out work, workers consuming the queue, dead-letter queue for failures.
- Observability
- Metrics (Prometheus / Cloud Monitoring), logs (centralised), distributed traces (OpenTelemetry). State this last but never skip it.
The deep dive — where the real signal is
The interviewer will pick a box and ask one of three questions: "scale this," "what happens when this fails?", or "what changes if QPS goes 100×?" These aren't trick questions — they're the evaluation. The deep dive is where candidates separate, and it runs until time is up.
The trade-off cheatsheet
| You want | You lose |
|---|---|
| Strong consistency | Latency on the slow path; availability during partitions |
| Low latency | Either consistency or freshness — you cache something stale |
| High availability | Strong consistency; usually some operational complexity |
| Cheap reads | Expensive writes — every cache, every replica is paid for at write time |
| Cheap writes | Expensive reads — denormalised, fan-out, scatter-gather |
| Multi-region | Cross-region latency on writes; double the cost; conflict handling |
What to do when you get stuck
Every candidate gets stuck. The ones who recover well have a default move:
- Start with scope, every time. Five minutes, not ten. The interviewer's patience is finite and scope is where you build the vocabulary for the rest of the session.
- Estimates before diagrams. The numbers in the margin get referenced constantly. They're your authority to make opinionated choices.
- Talk while drawing. Silence reads as confusion even when it isn't. Narrate what you're drawing and why.
- End with "what I'd improve." Always, without exception. Every real production system is unfinished, and the interviewer knows it.
- If stuck, walk the request path. Trace the request from client to database and back. The bottleneck is almost always between two components you've already drawn.