15 stages · 91 topics · 45 core
System Design Roadmap

From "what's a load balancer" to "I can design Twitter."

Fifteen stages, ~80 topics, every one linked to a Semicolony deep dive, simulator, or worked-problem playbook entry. The architecture diagram below is the map; each stage card down the page lights up the part of the diagram it owns. Read top to bottom, or jump to wherever you're stuck.

A SYSTEM, IN TWELVE REGIONS ClientBrowser / appDNSname → IPCDNedge cacheLoad BalancerL4 / L7 + WAFApp tierObservabilitylogs · metrics · tracesRED · USE · SLOCacheRedis / MemcachedDB primarywrites + sync readsDB replicas (N)async read replicasreplQueue / Kafkaasync jobs · eventsWorkers (N)background processingObject storemedia · backups · cold dataSearch indexElasticsearch / OpenSearchHover or scroll a stage below — the regions it owns will light up.

Depth

Showing 79 of 91 topics across 15 of 15 stages.

01 app · db-primary

Trade-off vocabulary

The three pairs of words every interview uses.

Senior interviews are conducted in trade-off language. Before any architecture sketch, you should be able to use these six words precisely: performance, scalability, latency, throughput, availability, consistency. Most candidates blur them.

02 client · dns · cdn

The edge — DNS, CDN, anycast

Everything before your origin server gets to think.

Most "fast" sites are not fast at the origin. They are fast because three-quarters of every page is served from a cache that sits within a few milliseconds of the user. Before you talk about anything happening at your app servers, know what is happening at the edge.

03 lb · app

Load balancing & reverse proxies

How the request gets to an app server that's actually free.

Once a request crosses your edge, a load balancer picks which app instance handles it. The L4 vs L7 distinction, active-active vs active-passive, and the algorithms (round-robin, least-connections, consistent hash) are the bread and butter of every system-design discussion.

Core

Reverse proxy

Nginx, HAProxy, Envoy. A proxy in front of your origin that handles TLS, compression, caching, rate limiting, and routing. Often colocated with the load balancer; logically a different layer.

Reverse proxy
04 app

The application tier

Monoliths, microservices, and what service discovery is for.

Most systems start as a monolith and split into services when team size makes a single deploy painful. The interesting parts are not "should we use microservices" (the question is usually misframed) but how services find each other, how they share schemas, and how they handle partial failure.

05 db-primary · db-replica

The data layer — relational

Postgres, MySQL, the parts every senior engineer should know.

A single relational database can take a system surprisingly far. Senior engineers know how far before reaching for sharding or NoSQL. ACID, replication shapes, MVCC, isolation levels, and the page cache are the vocabulary the data-layer discussion runs on.

Core

MVCC

Multi-version concurrency control. Readers see a snapshot; writers create a new version. The key to non-blocking reads in Postgres, Oracle, and most modern engines.

MVCC
Core

Replication — primary-replica

One node accepts writes, N replicas serve reads. Async replication is fast but can return stale reads; sync is consistent but blocks on a slow replica. Most systems run async with read-your-writes pinning.

Replication
06 db-primary

The data layer — NoSQL shapes

KV, document, wide-column, graph — when each one fits.

NoSQL is not one thing. It is four shapes, each tuned for a different access pattern. The right way to pick is: write down your reads (the queries you actually need), then pick the shape that serves them with one round trip.

Core

SQL or NoSQL?

Start with SQL. Move to NoSQL when one specific access pattern needs throughput or scale that SQL cannot give, and you can accept the loss of joins / strong consistency on that workload. Mixing both in one system is common.

Choosing a database
07 db-primary · db-replica

Sharding & denormalization

When one box stops being enough.

Sharding is the move every senior interview eventually reaches for. The hard part is not splitting a table; it is picking a shard key that survives growth, avoids hot spots, and tolerates re-sharding. Denormalization is the smaller cousin: duplicate data to make the read fast.

Core

Sharding strategies

Hash partitioning spreads load evenly but breaks range scans. Range partitioning supports scans but invites hot shards. Geographic / tenant-based partitioning works when the access pattern aligns naturally.

Sharding deep dive
Core

Consistent hashing

A hash ring where adding or removing a node moves only K/N keys instead of nearly all of them. Used by Dynamo, Cassandra, and most modern KV stores.

Sim Consistent hashing
08 client · cdn · app · cache

Caching, end to end

Five layers. Each one is the right answer to a different question.

Every fast system caches at four or five layers. Knowing where to cache, what invalidation strategy fits, and what happens when the cache goes cold under a thundering herd is what the caching discussion is really about.

Core

Edge / CDN cache

Pull-CDNs cache responses keyed by URL + Vary headers. Cache-Control max-age, stale-while-revalidate, and CDN-specific TTL knobs are the dials.

CDN
09 queue · worker

Asynchronism — queues, workers, back-pressure

When the work doesn't need to finish before the response.

Anything that can be done after the response should be. Email, image processing, indexing, analytics, anything with a "this will arrive within five minutes" SLA — it goes through a queue. The interesting bits are at-least-once vs exactly-once, ordering guarantees, and how the queue itself doesn't collapse under load.

Core

Message queues vs task queues

A message queue is dumb pipes (RabbitMQ, SQS); a task queue layers application semantics on top (Celery, Sidekiq, Resque). Pick the simpler one that meets your needs.

Message queues
Core

Delivery semantics

At-most-once (drop on failure), at-least-once (retry until acked, may duplicate), exactly-once (the holy grail, expensive). Exactly-once usually means at-least-once + idempotent consumers.

Idempotence
10 app · lb

Communication protocols

TCP, UDP, HTTP, gRPC, WebSocket — which to pick.

Every system-design discussion eventually asks "what protocol do these services speak?". The answer depends on call shape (single request, streaming, bidirectional), latency budget, payload size, and operational tooling.

Core

HTTP/1.1, 2, 3

HTTP/1.1 = one request per connection (kept alive). HTTP/2 = multiplexed streams over one TCP connection, head-of-line blocking at the TCP layer. HTTP/3 = QUIC over UDP, fuses TCP+TLS, no HoL blocking.

HTTP
Core

REST vs gRPC

REST is HTTP/JSON: human-readable, debuggable, slow. gRPC is HTTP/2 + Protobuf: schema-first, fast, harder to debug at the wire. Most companies use REST at the public edge, gRPC between internal services.

Sim gRPC vs REST
11 app · db-primary

Capacity planning — the napkin math

Powers of two, latency numbers, five-minute back-of-envelope.

Every system-design round eventually asks you to estimate. DAU to QPS, QPS to storage, storage to shards, shards to instance count. The math is simple; the practice is what makes it second-nature in the room.

Core

Powers of two

Internalise: 2^10 = 1K, 2^20 = 1M, 2^30 = 1B, 2^40 = 1T. Five orders of magnitude in four numbers.

Napkin math
Core

DAU → QPS conversion

A common trick: DAU × actions-per-user / 86400 = average QPS. Multiply by 3-5x for peak. A 10M-DAU service doing 5 actions/user averages ~600 QPS; peaks near 3000.

Napkin math
12 lb · app · queue

Failure modes

Partial failure, retries, idempotence, circuit breakers.

The point where backend engineering stops being about correctness and starts being about correctness under failure. The network is not reliable, latency is not zero, machines fail. Every senior loop reaches into this stage.

Core

Idempotence

A request is idempotent if doing it twice has the same effect as doing it once. Every safe retry strategy starts here.

Idempotence
13 app · lb

Security at system scale

TLS, OAuth, secrets, common attacks.

Security questions in system-design interviews are usually not deep cryptography. They are "how does service A authenticate to service B", "where are the secrets", and "what happens if a token leaks". Cover those and you're competitive.

Core

TLS — what it actually does

Encryption in transit + identity verification. Not the same as encryption at rest. mTLS extends both directions, the standard between internal services.

How HTTPS works
Core

OAuth 2.0 + OIDC

Authorization (OAuth) is "what can you do". Authentication (OIDC, layered on top) is "who are you". Three legs (auth code with PKCE) for browsers, two legs (client credentials) for service-to-service.

OAuth
14 observability

Observability

Logs, metrics, traces — and the methodologies (USE, RED) that organise them.

Three signals (logs, metrics, traces) plus two methodologies (USE for resource saturation, RED for request health). Cover most of what an on-call rotation actually needs. The harder skill is not collecting the data; it is knowing what to put on the dashboard so the right thing is obvious during an incident.

Core

The three signals

Logs are events. Metrics are aggregated numbers. Traces tie a request to its constituent calls. Together they give the three windows into what your system is actually doing.

Observability — methods
Core

RED method

Rate, Errors, Duration. The three numbers every request-driven service tracks. The minimum dashboard.

RED method
15 client · cdn · lb · app · cache · db-primary · db-replica · queue · worker

Putting it together — worked problems

Fourteen canonical interview problems, end to end.

Reading about components and designing with them are different skills. The way to bridge the gap is to work the canonical problems out loud and defend every choice. The book to read first is Designing Data-Intensive Applications. The problems below are the laps every senior candidate practises.

Core

Design Pastebin

The "design a URL shortener with a twist" problem — text snippets, expiry, view tracking. Tests cache strategy + KV store basics.

Playbook Pastebin
Core

Design a URL shortener

The classic. Hashing, KV store, cache strategy. The deep-dive usually goes to hot-key handling or analytics.

Playbook URL shortener
Core

Design a news feed (Twitter / Facebook)

Fan-out on write vs fan-out on read vs hybrid. The celebrity problem breaks the naive version of either. Tests sharding + caching + queue strategy.

Playbook News feed
Core

Design a chat (WhatsApp / Slack)

WebSocket connection pool, message ordering, group-message fan-out, presence. Tests realtime communication + storage.

Playbook Chat
Core

Design object storage (S3, Dropbox)

Eleven nines of durability at exabyte scale. Erasure coding, metadata + data planes split, multipart upload, repair scanners.

Playbook Object storage
Core

Design Twitter

The 250M-DAU read-heavy feed. Timeline fan-out, the celebrity problem, ranking, search. The canonical hybrid push/pull design.

Playbook Twitter
Core

Scale to millions on AWS

The layer-by-layer evolution: one EC2 → ALB + ASG → microservices + sharded data → multi-region. What breaks first at each stage.

Playbook Scale to millions on AWS

Found this useful?