15 stages · 91 topics · 45 core

System Design Roadmap

From "what's a load balancer" to "I can design Twitter."

Fifteen stages, ~80 topics, every one linked to a Semicolony deep dive, simulator, or worked-problem playbook entry. The architecture diagram below is the map; each stage card down the page lights up the part of the diagram it owns. Read top to bottom, or jump to wherever you're stuck.

Depth

Showing 79 of 91 topics across 15 of 15 stages.

01 app · db-primary

Trade-off vocabulary

The three pairs of words every interview uses.

Senior interviews are conducted in trade-off language. Before any architecture sketch, you should be able to use these six words precisely: performance, scalability, latency, throughput, availability, consistency. Most candidates blur them.

Core

Performance vs scalability

Performance is how fast one request gets through. Scalability is how the system holds up when you add a hundred more. They look the same when load is low and diverge sharply once you hit a bottleneck.

Performance vs scalability

Handbook decisions External Lethain: architecting systems for scale

Core

Latency vs throughput

Latency is the time one operation takes. Throughput is how many you can do per second. Little's Law ties them: concurrency = throughput × latency. Bumping one almost always costs the other.

Latency vs throughput

Sim Big-O race Paper Tail at scale External Little's Law

Core

Availability vs consistency

You cannot have both during a network partition. CAP says you pick. PACELC extends the question to the no-partition case (where the trade is latency vs consistency). Most real systems settle for "mostly consistent, occasionally stale" with a hard limit on staleness.

Availability vs consistency

CAP & PACELC deep dive Sim CAP theorem Paper Building on quicksand

Consistency patterns

Weak, eventual, session, strong, linearisable. Five strength bands every database picks from. Postgres, DynamoDB, Spanner, Cassandra all live in different regions of the spectrum.

Consistency patterns

Isolation levels Paper Calvin

Availability patterns

Failover (cold, warm, hot), replication shapes, and the math behind "five nines." Plus why redundancy past a point gives you less availability, not more.

Availability patterns

Replication Failure detectors

02 client · dns · cdn

The edge — DNS, CDN, anycast

Everything before your origin server gets to think.

Most "fast" sites are not fast at the origin. They are fast because three-quarters of every page is served from a cache that sits within a few milliseconds of the user. Before you talk about anything happening at your app servers, know what is happening at the edge.

Core

DNS resolution

Names become IPs through a chain of cached lookups. Every request starts here. TTL is the dial that trades freshness for query volume.

How DNS works

Sim DNS resolution simulator DNS deep dive External Julia Evans: implement DNS in a weekend

Core

CDN — push vs pull

A network of edge caches that sit close to users. Pull-CDNs fetch on first miss and cache; push-CDNs are pre-populated by the origin. Most sites use pull because it requires no extra plumbing.

How CDNs work

Sim CDN cache simulator Anycast routing External Cloudflare: what is a CDN

Anycast & BGP

A single IP advertised by many points of presence. ISPs route to the topologically nearest one. The reason your CDN feels fast.

BGP & internet routing

Sim Anycast routing simulator Routing protocols External Cloudflare anycast explained

03 lb · app

Load balancing & reverse proxies

How the request gets to an app server that's actually free.

Once a request crosses your edge, a load balancer picks which app instance handles it. The L4 vs L7 distinction, active-active vs active-passive, and the algorithms (round-robin, least-connections, consistent hash) are the bread and butter of every system-design discussion.

Core

Layer-4 vs layer-7 load balancing

L4 picks a backend based on TCP/UDP headers (fast, opaque to the request). L7 reads the HTTP path / headers and routes intelligently (slower, smarter). Both are used together in practice.

Load balancing

LB deep dive Sim Load-balancer simulator Paper Maglev

Active-active vs active-passive

Active-active runs N healthy load balancers in parallel, each taking traffic. Active-passive keeps a hot standby that takes over when the primary fails. The trade is cost vs failover speed.

LB topologies

Failover modes

Core

Reverse proxy

Nginx, HAProxy, Envoy. A proxy in front of your origin that handles TLS, compression, caching, rate limiting, and routing. Often colocated with the load balancer; logically a different layer.

Reverse proxy

Service mesh sidecars

Consistent hashing for sticky routing

When the request needs to land on the same backend repeatedly (sessions, sharded caches, WebSocket connections), consistent hashing keeps key movement to K/N when you add or remove a node.

Sim Consistent hashing simulator

Sharding strategies

04 app

The application tier

Monoliths, microservices, and what service discovery is for.

Most systems start as a monolith and split into services when team size makes a single deploy painful. The interesting parts are not "should we use microservices" (the question is usually misframed) but how services find each other, how they share schemas, and how they handle partial failure.

Core

Monolith vs microservices

A monolith is one deployable. Microservices are many. The trade is communication cost + operational surface vs deploy velocity + ownership clarity. The right shape depends on team size, not on system size.

Decision: monolith or microservices

External Martin Fowler: microservice premium

Core

Service discovery

How does service A find a healthy instance of service B? Client-side (Consul, Eureka) puts the lookup in the caller. Server-side (an LB) puts it in the proxy. Kubernetes uses both.

Service discovery

Kubernetes networking External Consul docs

API gateway

A single entry point that handles routing, auth, rate limiting, request shaping, and protocol translation. Sits between clients and the service mesh.

API gateway

API protocols

Stateless services

If a service holds no per-user state, any instance can handle any request, which makes horizontal scaling trivial. The state has to live somewhere; push it to the data tier or to an external session store.

System-design playbook

05 db-primary · db-replica

The data layer — relational

Postgres, MySQL, the parts every senior engineer should know.

A single relational database can take a system surprisingly far. Senior engineers know how far before reaching for sharding or NoSQL. ACID, replication shapes, MVCC, isolation levels, and the page cache are the vocabulary the data-layer discussion runs on.

Core

ACID & isolation levels

Read uncommitted, read committed, repeatable read, snapshot, serializable. Each protects you from one more class of anomaly at one more class of cost.

Isolation levels

Sim Isolation-levels simulator Paper Critique of ANSI SQL isolation levels

Core

MVCC

Multi-version concurrency control. Readers see a snapshot; writers create a new version. The key to non-blocking reads in Postgres, Oracle, and most modern engines.

MVCC

Vacuum & compaction

Core

WAL — write-ahead logging

Append the change to a sequential log before touching the data pages. The reason your database survives a power loss without corruption.

WAL

How WAL works Sim WAL + crash recovery Paper ARIES

B-tree & LSM-tree

The two storage shapes every modern engine picks between. B-trees for read-heavy with mixed updates (Postgres, MySQL). LSM-trees for write-heavy with periodic compaction (Cassandra, RocksDB).

B-tree

Sim B-tree simulator Sim LSM-tree simulator LSM-tree Paper LSM tree paper

Core

Replication — primary-replica

One node accepts writes, N replicas serve reads. Async replication is fast but can return stale reads; sync is consistent but blocks on a slow replica. Most systems run async with read-your-writes pinning.

Replication

Sim Read-write quorum

Federation

Split one database into multiple by function — users in one, products in another. Often the first scaling step before sharding, since each database stays simple.

Federation vs sharding

06 db-primary

The data layer — NoSQL shapes

KV, document, wide-column, graph — when each one fits.

NoSQL is not one thing. It is four shapes, each tuned for a different access pattern. The right way to pick is: write down your reads (the queries you actually need), then pick the shape that serves them with one round trip.

Core

Key-value stores

Redis, Memcached, DynamoDB. Look up by exact key. The simplest shape; the highest throughput per node. Used for session stores, leaderboards, rate-limiter counters.

How Redis works

Sim Redis operations simulator Playbook KV store, designed Paper Dynamo

Document stores

MongoDB, Couchbase, DynamoDB-with-secondary-indexes. JSON documents keyed by ID with optional indexes on inner fields. Good for catalogs, user profiles, content systems.

Choosing a database

External MongoDB modeling guide

Wide-column stores

Cassandra, ScyllaDB, Bigtable. Rows have a partition key + a sort key, with columns added dynamically. Tuned for very high write throughput and time-series.

Paper Bigtable

Time-series storage

Core

SQL or NoSQL?

Start with SQL. Move to NoSQL when one specific access pattern needs throughput or scale that SQL cannot give, and you can accept the loss of joins / strong consistency on that workload. Mixing both in one system is common.

Choosing a database

Decision: SQL or NoSQL

07 db-primary · db-replica

Sharding & denormalization

When one box stops being enough.

Sharding is the move every senior interview eventually reaches for. The hard part is not splitting a table; it is picking a shard key that survives growth, avoids hot spots, and tolerates re-sharding. Denormalization is the smaller cousin: duplicate data to make the read fast.

Core

Sharding strategies

Hash partitioning spreads load evenly but breaks range scans. Range partitioning supports scans but invites hot shards. Geographic / tenant-based partitioning works when the access pattern aligns naturally.

Sharding deep dive

Sim Database sharding simulator

Core

Consistent hashing

A hash ring where adding or removing a node moves only K/N keys instead of nearly all of them. Used by Dynamo, Cassandra, and most modern KV stores.

Sim Consistent hashing

Paper Dynamo

Hot keys & celebrity fan-out

When one key gets 100x the traffic of the average, your evenly-sharded system grinds on that one shard. Mitigations: caching layer in front, per-key replication, or fan-out the writes ahead of time.

Playbook News feed playbook

Denormalization

Duplicate fields across tables so the read does not need a join. Write becomes more work; read becomes one query. Trade explicit consistency for read latency.

Denormalization

08 client · cdn · app · cache

Caching, end to end

Five layers. Each one is the right answer to a different question.

Every fast system caches at four or five layers. Knowing where to cache, what invalidation strategy fits, and what happens when the cache goes cold under a thundering herd is what the caching discussion is really about.

Core

Client-side cache

HTTP cache headers, service workers, the browser memory + disk caches. The fastest cache is the one that never sends a request.

Caching strategies

External MDN: HTTP caching

Core

Edge / CDN cache

Pull-CDNs cache responses keyed by URL + Vary headers. Cache-Control max-age, stale-while-revalidate, and CDN-specific TTL knobs are the dials.

CDN

Core

Application-level cache

Redis or Memcached sitting between your service and the database. Cache-aside (read-through with explicit fill), write-through (write hits cache and DB), and write-behind (cache absorbs writes, batches to DB) are the three patterns.

Caching

Sim Distributed cache simulator Sim Cache eviction Sim LRU cache

Database query cache

Materialised views, the buffer pool / page cache, and result caches. The cheapest one to forget about because the database has been doing it for you.

Page cache

Eviction policies — LRU, LFU, FIFO, TinyLFU

When the cache is full, who goes? LRU is the standard. LFU is better for stable hot-key sets. TinyLFU + Window-TinyLFU dominate modern caches (Caffeine, Redis 4+).

Sim Cache eviction simulator

Cache invalidation

TTL-based is simple. Event-driven (pub/sub on writes) is stronger but couples your services. Write-through gives you no staleness; cache-aside accepts brief staleness in exchange for simplicity.

Cache invalidation decision

09 queue · worker

Asynchronism — queues, workers, back-pressure

When the work doesn't need to finish before the response.

Anything that can be done after the response should be. Email, image processing, indexing, analytics, anything with a "this will arrive within five minutes" SLA — it goes through a queue. The interesting bits are at-least-once vs exactly-once, ordering guarantees, and how the queue itself doesn't collapse under load.

Core

Message queues vs task queues

A message queue is dumb pipes (RabbitMQ, SQS); a task queue layers application semantics on top (Celery, Sidekiq, Resque). Pick the simpler one that meets your needs.

Message queues

Kafka

Core

Kafka — log-as-database

Partitioned, append-only, consumer-tracked log. The substrate under most modern event-driven systems. Different shape from a queue: messages are replayable, partitions enforce per-key ordering.

Kafka deep dive

Playbook Event ingestion playbook External Kafka docs

Core

Delivery semantics

At-most-once (drop on failure), at-least-once (retry until acked, may duplicate), exactly-once (the holy grail, expensive). Exactly-once usually means at-least-once + idempotent consumers.

Idempotence

2PC & sagas

Back-pressure

What happens when consumers can't keep up? Buffering is the wrong answer. Slow producers, drop work intentionally, or shed load — anything except letting queues grow without bound.

Back-pressure & retries

Sim Retry strategy simulator

Dead letter queues

When a message has retried too many times, move it to a separate queue for human inspection. Without this, a single poison message can crash all consumers.

Back-pressure & retries

10 app · lb

Communication protocols

TCP, UDP, HTTP, gRPC, WebSocket — which to pick.

Every system-design discussion eventually asks "what protocol do these services speak?". The answer depends on call shape (single request, streaming, bidirectional), latency budget, payload size, and operational tooling.

Core

TCP

Ordered, reliable, congestion-controlled bytes. The default. The cost is three handshakes and head-of-line blocking.

TCP deep dive

Sim TCP handshake Sim TCP congestion TCP guide

UDP

Fire-and-forget datagrams. No ordering, no retries, no congestion control. The right answer for DNS, real-time gaming, video, anything where late is worse than missing.

UDP deep dive

Sim TCP vs UDP simulator

Core

HTTP/1.1, 2, 3

HTTP/1.1 = one request per connection (kept alive). HTTP/2 = multiplexed streams over one TCP connection, head-of-line blocking at the TCP layer. HTTP/3 = QUIC over UDP, fuses TCP+TLS, no HoL blocking.

HTTP

Sim HTTP/2 streams Sim HTTP/3 + QUIC QUIC

Core

REST vs gRPC

REST is HTTP/JSON: human-readable, debuggable, slow. gRPC is HTTP/2 + Protobuf: schema-first, fast, harder to debug at the wire. Most companies use REST at the public edge, gRPC between internal services.

Sim gRPC vs REST

REST gRPC

WebSocket & SSE

Persistent bidirectional connection (WebSocket) or server-to-client push (SSE). The right answer for chat, presence, live dashboards, anything pushed from server.

WebSockets

WebSockets & SSE Realtime communication

11 app · db-primary

Capacity planning — the napkin math

Powers of two, latency numbers, five-minute back-of-envelope.

Every system-design round eventually asks you to estimate. DAU to QPS, QPS to storage, storage to shards, shards to instance count. The math is simple; the practice is what makes it second-nature in the room.

Core

Powers of two

Internalise: 2^10 = 1K, 2^20 = 1M, 2^30 = 1B, 2^40 = 1T. Five orders of magnitude in four numbers.

Napkin math

Core

Latency numbers every engineer knows

L1: 0.5 ns. RAM: 100 ns. SSD: 100 µs. Network round-trip same DC: 500 µs. Cross-continental: 100 ms. The factor-of-1000 gaps are what make the difference between a snappy system and a slow one.

Performance methods

External Jeff Dean: numbers every programmer should know

Core

DAU → QPS conversion

A common trick: DAU × actions-per-user / 86400 = average QPS. Multiply by 3-5x for peak. A 10M-DAU service doing 5 actions/user averages ~600 QPS; peaks near 3000.

Napkin math

Storage estimation

Average record size × records-per-day × days-of-retention × replication-factor × index-overhead. The replication and index multipliers are where candidates usually forget.

Napkin math

12 lb · app · queue

Failure modes

Partial failure, retries, idempotence, circuit breakers.

The point where backend engineering stops being about correctness and starts being about correctness under failure. The network is not reliable, latency is not zero, machines fail. Every senior loop reaches into this stage.

Core

Idempotence

A request is idempotent if doing it twice has the same effect as doing it once. Every safe retry strategy starts here.

Idempotence

Core

Retries with exponential backoff + jitter

Naive retry storms when the original cause is overload. Exponential backoff + jitter spreads retries out so the dependency recovers instead of drowning further.

Sim Retry strategy

Back-pressure & retries External AWS: exponential backoff + jitter

Core

Circuit breakers

Three states: closed (normal), open (fail fast), half-open (probe to see if it's back). Stops a sick dependency from taking down the caller.

Sim Circuit breaker

External Hystrix wiki

Timeouts & bulkheads

Every network call must have a timeout. Bulkheads isolate failure domains so one slow dependency cannot exhaust the calling service's thread pool.

Back-pressure & retries

Failure detectors & leases

How a distributed system decides a node is dead. Naive timeouts cause split brain; gossip + suspicion levels (SWIM, phi-accrual) plus leases is the production answer.

Failure detectors

Sim Split-brain simulator Leases Gossip

13 app · lb

Security at system scale

TLS, OAuth, secrets, common attacks.

Security questions in system-design interviews are usually not deep cryptography. They are "how does service A authenticate to service B", "where are the secrets", and "what happens if a token leaks". Cover those and you're competitive.

Core

TLS — what it actually does

Encryption in transit + identity verification. Not the same as encryption at rest. mTLS extends both directions, the standard between internal services.

How HTTPS works

TLS deep dive

Core

OAuth 2.0 + OIDC

Authorization (OAuth) is "what can you do". Authentication (OIDC, layered on top) is "who are you". Three legs (auth code with PKCE) for browsers, two legs (client credentials) for service-to-service.

OAuth

OIDC Auth deep dive

JWT — the gotchas

A signed token that carries claims. The pitfalls: no built-in revocation, easy to mis-configure (alg=none), short-lived access tokens + long-lived refresh tokens is the standard pattern.

Sim JWT lifecycle

External OWASP JWT cheat sheet

Rate limiting

Token bucket, leaky bucket, fixed window, sliding window. The four shapes under the same load. Apply at edge, gateway, and per-service for defence in depth.

Sim Rate limiter

Playbook Rate limiter playbook

14 observability

Observability

Logs, metrics, traces — and the methodologies (USE, RED) that organise them.

Three signals (logs, metrics, traces) plus two methodologies (USE for resource saturation, RED for request health). Cover most of what an on-call rotation actually needs. The harder skill is not collecting the data; it is knowing what to put on the dashboard so the right thing is obvious during an incident.

Core

The three signals

Logs are events. Metrics are aggregated numbers. Traces tie a request to its constituent calls. Together they give the three windows into what your system is actually doing.

Observability — methods

Core

RED method

Rate, Errors, Duration. The three numbers every request-driven service tracks. The minimum dashboard.

RED method

USE method

Utilisation, Saturation, Errors. The resource-side complement to RED. For every machine, every queue, every connection pool.

USE method

External Brendan Gregg: USE method

SLO, SLA, error budgets

The numerical contract with users. Pick latency + availability targets, measure honestly, and let the error budget govern release pace.

SLO basics

External Google SRE: SLOs

15 client · cdn · lb · app · cache · db-primary · db-replica · queue · worker

Putting it together — worked problems

Fourteen canonical interview problems, end to end.

Reading about components and designing with them are different skills. The way to bridge the gap is to work the canonical problems out loud and defend every choice. The book to read first is Designing Data-Intensive Applications. The problems below are the laps every senior candidate practises.

Core

The 6-step framework

Scope → estimate → API → data model → high-level design → deep dive. The repeatable 45-minute pass you can do in your sleep after enough practice.

Playbook System design playbook

Interview hub System-design drill

Core

Design Pastebin

The "design a URL shortener with a twist" problem — text snippets, expiry, view tracking. Tests cache strategy + KV store basics.

Playbook Pastebin

Core

Design a URL shortener

The classic. Hashing, KV store, cache strategy. The deep-dive usually goes to hot-key handling or analytics.

Playbook URL shortener

Core

Design a news feed (Twitter / Facebook)

Fan-out on write vs fan-out on read vs hybrid. The celebrity problem breaks the naive version of either. Tests sharding + caching + queue strategy.

Playbook News feed

Core

Design a chat (WhatsApp / Slack)

WebSocket connection pool, message ordering, group-message fan-out, presence. Tests realtime communication + storage.

Playbook Chat

Core

Design object storage (S3, Dropbox)

Eleven nines of durability at exabyte scale. Erasure coding, metadata + data planes split, multipart upload, repair scanners.

Playbook Object storage

Core

Design a rate limiter

Token bucket implementation in Redis. Distributed, per-user, per-endpoint. Tests Redis + algorithm + edge-case handling.

Playbook Rate limiter

Sim Rate-limiter simulator

Design a typeahead / autocomplete

Trie data structure, n-gram indexing, ranking. Tests data-structure choice + caching + write throughput.

Playbook Typeahead

Sim Trie autocomplete

Design a web crawler

BFS frontier, deduplication, robots.txt, distributed work queue. Tests queueing + storage + politeness.

Playbook Web crawler

Design Uber / ride matching

Geo-spatial indexing (geohash, quadtree, S2), dispatch algorithm, hot regions. Tests spatial data + matching.

Playbook Ride matching

Design event ingestion (Kafka-shaped)

A pipe that swallows millions of events per second and lets multiple consumers read at their own pace. Tests partitioning + back-pressure + retention.

Playbook Event ingestion

Design notifications

Per-user per-channel preferences, delivery retries, dedupe, batching. Tests queueing + idempotence + state.

Playbook Notifications

Core

Design Twitter

The 250M-DAU read-heavy feed. Timeline fan-out, the celebrity problem, ranking, search. The canonical hybrid push/pull design.

Playbook Twitter

Design Instagram

Write-heavy photo upload. Direct-to-S3 uploads, the variant-encoding pipeline, the CDN strategy that absorbs ~95% of egress.

Playbook Instagram

Design Netflix

Video streaming at planet scale. Adaptive bitrate, the Open Connect private CDN, per-shot encoding, recommendations.

Playbook Netflix

Core

Scale to millions on AWS

The layer-by-layer evolution: one EC2 → ALB + ASG → microservices + sharded data → multi-region. What breaks first at each stage.

Playbook Scale to millions on AWS

Practise the worked problems

Fourteen end-to-end design problems: Twitter, Uber, Dropbox, KV stores, rate limiters. Each with capacity math, API sketch, architecture, deep-dive trade-offs.

Open the playbook

Drill

Run the 45-minute round

Interactive seven-phase timer: clarify, capacity, API, architecture, deep dive, trade-offs, wrap-up. The actual interview pacing.

Start a round

Push

Run the simulators

49 interactive simulators: Raft, CAP, sharding, rate limiting, cache eviction, container layers. All in the browser.

Browse simulators

Decide

The decision handbook

When to shard, when to introduce a queue, when to denormalise, how to estimate cost. Twelve opinionated rules.

Open the handbook

Read

Annotated papers

Spanner, Dynamo, Bigtable, Raft, Paxos, GFS, MapReduce, CRDTs. The papers under the playbook problems.

Browse papers

Sideways

The backend roadmap

The other roadmap on this site. Covers the wider backend curriculum (networking, OS, languages, security) that system design sits on top of.

Open the roadmap

Found this useful?