The playbook
Nineteen canonical system-design questions, each walked through end-to-end at the depth senior+ loops expect. Same shape every time: clarifying questions, capacity math, API, schema, the architecture diagram, the deep dive on the hard part, the failure modes, the trade-offs. The goal isn't memorisation — the patterns repeat, and once you've done four of these the fifth is a remix.
19 designs
The 19 designs
Each is walked end-to-end. Roughly 25–35 minutes of careful reading per page.
- 01
URL shortener
The canonical first design. Counter vs. hash vs. Snowflake IDs, base62, cache strategy, eviction, the analytics pipeline.
- 02
Pastebin
Object storage + edge cache + retention. Tiered storage, content-addressed dedupe, the abuse-handling pipeline.
- 03
Distributed key-value store
Dynamo-style. Consistent hashing, vector clocks, sloppy quorum, hinted handoff, anti-entropy.
- 04
News feed
Pull vs. push vs. hybrid. Fan-out on write for the famous, on read for the rest. Ranking, dedupe, freshness.
- 05
Chat / DM
WebSocket fleet, message store, presence, delivery semantics, group fan-out.
- 06
Distributed rate limiter
Token bucket vs. sliding window. Lua-on-Redis, the local-cache-with-sync pattern, hot-key sharding.
- 07
Notification system
Push, email, SMS — fan-out, dedupe, retry, regulated delivery, the privacy story.
- 08
Ride matching (Uber-shape)
Geospatial indexing — geohash, S2, quadtree. Driver-rider matching, ETAs, surge pricing, the trip lifecycle, multi-region.
- 09
Search autocomplete / typeahead
Prefix tries, query-log ranking, top-k per prefix, the offline freshness pipeline, edge caching, personalisation hooks.
- 10
Web crawler
The URL frontier, politeness, dedup (Bloom + canonical URL), DNS caching, content storage tier, the re-crawl scheduler.
- 11
Top-k / trending
Count-min sketch, heavy hitters, decay windows, two-tier (real-time + batch) aggregation. The "trending tweets" pipeline.
- 12
Distributed scheduler (cron at scale)
Single-leader vs. sharded scheduling, durable schedule store, fire-once semantics, missed-tick handling, at-least-once vs. at-most-once.
- 13
Event ingestion (ad-click counter)
Kafka tier, idempotency keys, columnar OLAP rollups, fraud filter, exactly-once accounting, billing-grade correctness.
- 14
Object storage (S3-shape)
Bucket + key namespace, erasure coding vs. replication, multi-part upload, the eventual-vs-strong read-after-write story, eleven-nines durability.
- 15
Twitter
The 250M-DAU read-heavy feed. Timeline fan-out, the celebrity problem, ranking, search. The canonical hybrid push/pull design.
- 16
Instagram
Write-heavy photo upload at scale. Direct-to-S3 uploads, the variant-encoding pipeline, the CDN strategy that absorbs ~95% of egress.
- 17
Mint (account aggregation)
Scheduled sync against fragile partner APIs. Idempotent retries, per-institution rate limits, credential encryption, categorisation.
- 18
Netflix
Video streaming at planet scale. Adaptive bitrate, the Open Connect private CDN, per-shot encoding, recommendations.
- 19
Scale to millions on AWS
The layer-by-layer evolution: one EC2 → ALB + ASG → microservices + sharded data → multi-region. What breaks first at each stage.
How each page is structured
The same eight-section template. Drilling on one design teaches the shape; reading five teaches the patterns that repeat.
- Clarifying questions. The first five minutes of any real interview. What's in scope, what isn't, the user volumes, the SLOs, the data lifetime.
- Capacity math. QPS, payload size, read/write ratio, P99 latency, storage horizon. The five napkin numbers that decide every other choice.
- API and data model. Endpoints with their request/response. The schema with field types and indexes. The shape of the wire bytes.
- High-level architecture. The boxes-and-arrows diagram with each layer's role. Where reads go. Where writes go. What's stateless.
- Deep dive on the hard part. The interesting choice — sharding strategy, hot-key handling, replication topology, async work, the consistency story.
- Failure modes. What dies, what oncall sees, what the runbook does. The failure cases interviewers love to test.
- Cost & operability. The dollar line per million requests, per TB stored. What gets paged, what's a ticket, what's a bug.
- Trade-offs & what's next. What you'd change at 10× scale, what you'd change for a different region story, what the next layer down would add.