01 / 19
Playbook / 01

URL shortener

The canonical first system-design question. The mechanics are simple enough that a strong candidate finishes it cleanly in 35 minutes; the depth comes from the trade-offs around ID generation, the cache strategy, and the analytics pipeline. Walk this once, and the patterns it teaches — capacity math, an ID-generation choice, a read-heavy cache, sharding the write path — show up in most of the eighteen designs that follow.


1 · Clarifying questions

Five minutes. The right questions narrow the design enough to make the math possible and the architecture defensible.

How long are the short codes?Pick a length first. 7 base62 characters → ~3.5 trillion codes; enough for a Bitly-class system.
Are codes user-customisable?Custom aliases ("/my-talk") add a uniqueness check on the hot path. Assume opt-in for paid tiers; default is system-generated.
How long do URLs live?Forever, with optional expiry. Forever changes the storage horizon by orders of magnitude — confirm.
What's the read/write ratio?~100:1 in practice. Most URLs get clicked many times; a small fraction get clicked millions of times.
Do we need analytics?Yes — click counts, geo, referrer. Async pipeline; not on the redirect critical path.
Latency budget for the redirect?P99 ≤ 50 ms end-to-end. The redirect is the user-visible product surface.
Authentication?Out of scope for the design round; assume an existing identity service for the create-link API.
Multi-region?Yes for reads (low-latency anycast). Single-region writes initially, with active-active later.

2 · Capacity math, on a napkin

Before any boxes. The numbers that follow drive every later decision — the schema choice, the cache size, the shard count, the cost line.

NumberCalculationResult
Writes / daygiven100M
Reads / day100× writes (100:1)10B
Write QPS (avg / peak)100M / 86,400 × 3~1.2K / ~3.5K
Read QPS (avg / peak)10B / 86,400 × 3~120K / ~350K
Concurrency, readsLittle: 350K × 0.05 s~17,500 in-flight
Storage / rowid + URL + meta~500 B
Storage / year100M × 500 B × 365~18 TB
Storage / 10 yr×10~180 TB
Cache (hot 10%)10% × 100M × 500 B / yr~5 GB / yr → ~50 GB at 10 yr
Egress, redirects120K avg × ~512 B headers~5 TB / day

The numbers tell the story before the architecture does. 10B reads/day is "this is a hot, read-heavy KV system". 100M writes/day is "single writer per shard is fine; we don't need a write-side coordinator yet".

Interactive · URL shortener — adjust and watch Drag the sliders, watch the math.
Presets
1.2K
0.5 KB
100:1
50 ms
×3
10%
1 yr
Throughput
Write QPS, peak3.6K
Read QPS, peak360.0K
Read concurrency (Little's)18.0K
Write concurrency180
Storage & cache
Raw / day49.4 GB
Raw / horizon17.6 TB
With ×3 replication52.9 TB
Hot-set cache (RAM)1.8 TB
Read egress / day4.8 TB
Formulas. Concurrency = QPS × P99 (Little's law). Storage / day = QPS × 86 400 × payload. Peak factor 3× over avg. Hot-set is the RAM working set you'd need for that read percentage.

3 · API and data model

The wire surface and the row layout. Both are deliberately small.

Endpoints

POST /v1/links # create
{"url":"https://example.com/long/path", "alias":"my-talk"} # alias optional
→ 201 {"code":"aB3x9Q2", "short_url":"https://tk.sh/aB3x9Q2"}

GET /:code # redirect (the hot path)
→ 301 Location: https://example.com/long/path
 Cache-Control: public, max-age=300

GET /v1/links/:code/stats # async-served from analytics store
→ 200 {"clicks":1024,"top_country":"US","top_referrer":"twitter.com"}

DELETE /v1/links/:code # owner only
→ 204

Schema

links
 code VARCHAR(8) PRIMARY KEY -- base62, 7 chars + sentinel
 url TEXT NOT NULL -- target, max 2 KB
 owner_id BIGINT NULL -- FK to users
 alias BOOLEAN NOT NULL -- custom vs system-generated
 created_at TIMESTAMP NOT NULL
 expires_at TIMESTAMP NULL
 deleted BOOLEAN NOT NULL DEFAULT FALSE

 INDEX (owner_id, created_at) -- "my links"
 INDEX (expires_at) WHERE expires_at IS NOT NULL -- expiry sweep

clicks_daily -- analytics, separate store
 code VARCHAR(8)
 date DATE
 country CHAR(2)
 referrer TEXT
 count BIGINT
 PRIMARY KEY (code, date, country, referrer)

The redirect path reads only `code → url`. Everything else lives off the hot path. The clicks table is in a columnar store (ClickHouse / BigQuery) — never queried during a redirect.

4 · High-level architecture

The boxes-and-arrows. The redirect path is hot and read-heavy; the create path is cold and write-heavy. They scale independently.

Three layers. Edge is anycast cache (Cloudflare / Fastly). Region is stateless services. Storage is sharded KV for the hot path, columnar for analytics. The async lane carries click events into ClickHouse via Kafka.

5 · The hard part — generating short codes

The interesting choice. There are four serious answers; each has a failure mode worth knowing.

StrategyHow it worksStrengthsWeaknesses
Counter Auto-increment integer, base62-encoded Trivially unique. Compact. No collision check. Single bottleneck unless sharded. Reveals volume to anyone enumerating codes.
Hash SHA-256(url + salt), take first 7 base62 chars Stateless. Same input → same output (good for dedupe, bad for privacy). Collision rate at 7 chars × 100M URLs ≈ 1 in 35K. Need a check-and-retry path on collision.
Snowflake 64-bit ID: 41 b timestamp + 10 b machine + 12 b seq Distributed, no coordination. Time-ordered (cache-friendly). Clock skew breaks uniqueness. 64 b → 11 base62 chars (longer than ideal).
Centralised range allocator Each writer claims a 10K-id range from Zookeeper / etcd Compact codes. Survives skew. Any writer can fill in independently. Coordinator is a SPOF if not done right. Range exhaustion is a separate failure mode.

The recommendation for a fresh design: centralised range allocator. Each redirect-create pod claims a range of 10K IDs; allocates from the in-memory counter; refills when low. Survives the coordinator dying (the in-memory range survives), survives clock skew, and produces compact 7-character codes. The coordinator can be Zookeeper, etcd, or a single Postgres row with a SELECT FOR UPDATE.

Why not hash-only. Hashing the URL feels elegant and avoids coordination, but it makes one URL always map to one code — which means the second user shortening the same URL gets the same code, which leaks information about who else shortened it. Hash + a per-user salt fixes this but reintroduces the storage cost. A counter-derived ID is simpler.

6 · Base62 encoding, in three lines

Convert a 64-bit integer into a 7- to 11-character string over the alphabet [0-9A-Za-z]. The implementation is short enough to fit on the whiteboard.

const ALPHA = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

function encode(n) {
 let out = "";
 do {
 out = ALPHA[n % 62] + out;
 n = Math.floor(n / 62);
 } while (n > 0);
 return out;
}

function decode(s) {
 let n = 0;
 for (const c of s) n = n * 62 + ALPHA.indexOf(c);
 return n;
}

7 characters of base62 is 627 ≈ 3.5 trillion. At 100M new codes/day, that's 96 years before exhaustion. Plenty.

7 · The cache strategy

Reads are 100× writes; the cache earns its keep. Three layers, each catching a different traffic shape.

  • Edge cache (CDN). 60-second TTL on the redirect response. Catches the burst when a popular link goes viral. Cloudflare / Fastly with anycast. ~80% of read traffic terminates here without ever hitting our infrastructure.
  • Hot KV cache (Redis cluster). Cache-aside on the redirect path. ~50 GB sharded across 32 nodes; LRU eviction; 95%+ hit rate on the surviving 20% of reads. P99 reach: ~3 ms.
  • Local in-process cache. A 10K-entry LRU per pod, 30-second TTL, captures the very-hot tail. Avoids round-tripping to Redis for the top 0.1% of links that account for 30% of clicks.
The cache invariant nobody mentions. When a link is updated or deleted, the edge cache, the Redis cache, and every pod's local cache must learn. Use short TTLs (60 s edge, 5 min Redis, 30 s local) plus a Pub/Sub fan-out on the update path. "Eventual" is fine for an analytics product; for a delete, longer invalidation windows are a privacy bug.

8 · Sharding the storage

32 shards by hash(code). Each shard sits on a 3-node replica set with one writer and two read replicas. The shard count is chosen to give headroom for 10× growth without resharding — at 100M writes/day per shard, we have plenty of room.

DecisionChoiceWhy
Shard keyhash(code)Even distribution; no hot shards. created_at would create a hotspot on the newest shard.
Shard count32 (~32 = 25)Power-of-two simplifies routing. Headroom for 10× growth before a reshard.
Replication3-way per shardOne writer + two readers in the same region. Async cross-region replicas for read fanout.
Read consistencyEventual; fall through to primary on cache missClick traffic doesn't need linearizability; the create path does, and reads it from primary.
EngineDynamoDB or Cassandra (managed if available)The workload is pure KV. Don't pay relational overhead for nothing.

9 · Failure modes & runbook

What dies, what oncall sees, what the runbook says. The interview-room version of the operational story.

FailureSymptomMitigation
One redirect pod crash-loopsP99 redirect latency rises ~5%K8s restarts; shed traffic via readiness probe; alert at 3 minutes.
Redis shard offline20× rise in DB load on its key rangeLocal in-process cache absorbs ~30% of impact; circuit-break to direct-DB; auto-failover Redis (RDB snapshot replica).
One DB shard offline1/32 of redirects 5xxReplica failover (~30 s with auto-promote); CDN serves stale-on-error for 5 min while replica catches up.
ID-allocator unreachableCreate-link API errors outEach pod has 10K-ID buffer; tolerates ~10 minutes of outage. Page on buffer < 1K.
Region offline1/3 of read traffic diesAnycast routes around it within 30 s; RTO ≤ 5 min for full failover via DNS.
Hot key (one URL goes viral)Specific shard at 100% CPUEdge cache absorbs > 90%; local pod cache the rest. If still hot, dynamically promote to the in-process cache for 1 hour.
Click flood (DDoS on one URL)Edge cache hit rate at 99%, origin still saturatingWAF rate-limit per source IP; absolute limit per code at 10K rps. Above that, return cached value with no-counting.

10 · Cost & SLOs

A ballpark in 2026 cloud prices. Numbers are right to within 2× — that's enough to justify decisions in a design round.

LineEstimateNote
CDN egress~$1.5K / month~5 TB / day × $0.01 / GB at scale
Compute (200 redirect + 50 create pods)~$3K / month2 vCPU × 4 GB; managed K8s
Redis cluster (50 GB × 3 replicas)~$2K / month32 × m6g.large equivalent
DynamoDB (180 TB, 350K avg reads)~$8K / monthOn-demand pricing; reserved drops 40%
ClickHouse (analytics)~$1K / month3-node, columnar, ingest from Kafka
Total~$15K / monthPer million MAU: ~$50/M-MAU

SLOs

  • Redirect availability: 99.99%. Edge + multi-region make this achievable. Per-quarter error budget: 12 minutes.
  • Redirect P99 latency: 50 ms. Measured at the edge; alert at P99 > 80 ms for 5 minutes.
  • Create availability: 99.9%. Tighter budget than redirect because writes are rarer. Per-quarter: ~2 hours.
  • Analytics freshness: ≤ 60 s. Click counts are eventually consistent; the dashboard makes that explicit.

11 · Trade-offs & "what would you change at 10×"

The interviewer's last 5 minutes. These are the answers that turn a competent design into a compelling one.

If…Then…
Writes 10×Reshard to 64. Add a write-side allocator with longer ranges. Move from on-demand DynamoDB to provisioned.
Reads 10×Edge cache absorbs most of it. Add a regional read-only Redis pool per region. Bump local in-process cache to 100K entries.
Strict SLA on freshness (e.g. delete must propagate < 1 s)Replace TTL-based invalidation with a Pub/Sub fan-out from the create / delete path; explicit purge to edge.
Multi-region writesChoose: active-passive (simpler, RPO ≥ 30 s) or active-active with last-writer-wins (riskier, requires CRDT-style conflict resolution).
GDPR / data residencyPin shards by region. The link table grows tags by region; the edge layer routes to the right region by URL prefix.
"What would a more senior answer add?"Add a phishing/malware filter on create. Cache poisoning for blocked domains. The honest next-tier answer is the moderation pipeline — that's the part most candidates skip.

Defending the decisions — interview pushback

The URL shortener looks small enough that interviewers feel free to probe every choice. Six exchanges recur — have one-sentence defenses ready.

  1. "Why 7-character Base62?" 62⁷ ≈ 3.5 trillion keys — enough headroom for any realistic scale, and the key stays short. Don't overthink the key space unless the interviewer raises it.
  2. "Why 302 and not 301?" 301 is cached by the browser permanently. Every click after the first bypasses your redirect service entirely — analytics disappear. Use 302 unless you're explicitly trading analytics for browser cache efficiency.
  3. "Why lazy Redis population?" A URL that's never visited doesn't need a cache entry. Populate Redis on first read, not at creation. Keeps the cache hot with actually-popular URLs rather than everything ever created.
  4. "Why CDN in front?" Redirect service and Redis cannot absorb a viral spike — 10M requests in 60 seconds from a celebrity tweet will overwhelm any backend. CDN with Cache-Control: public, max-age=300 is the only mitigation that scales.
  5. "Why Kafka for analytics, not synchronous writes?" Redirect p99 budget is < 10 ms. Writing a click event synchronously adds latency and creates a dependency that can fail. Fire-and-forget to Kafka; the analytics consumer catches up in its own time.
  6. "Why Postgres and not a distributed store?" 100M URLs ≈ 50 GB on a single RDS instance with room to spare. Don't reach for a distributed store until you have a problem that requires one — and at this scale, you don't.

Further reading

  • Bitly Engineering — "Skiplist-based Redirect Database". The closest public writeup of a real URL shortener at scale. Skim for storage-engine choices.
  • Stripe — "Idempotency Keys at Stripe". Not URL-shortener-specific, but the canonical reference for designing idempotent create APIs.
  • Snowflake (Twitter, 2010). The original distributed-ID paper. Worth reading once before defending Snowflake in a design round.
  • Marc Brooker — "Exponential Backoff and Jitter". The retry-with-jitter pattern that keeps the create path from synchronising under load.
  • Uber Engineering — "Schemaless". A different shape (Uber's KV-on-MySQL) but useful as a counter-example to "just use Cassandra".
  • Adjacent: Consistent hashing. The shard router under the hood.
  • Adjacent: Distributed IDs. A standalone walkthrough of Snowflake, ULID, and friends.
  • Adjacent: DNS. Anycast routing — how the edge POPs find the user.
Found this useful?