01 / 19

Playbook / 01

URL shortener

The canonical first system-design question. The mechanics are simple enough that a strong candidate finishes it cleanly in 35 minutes; the depth comes from the trade-offs around ID generation, the cache strategy, and the analytics pipeline. Walk this once, and the patterns it teaches — capacity math, an ID-generation choice, a read-heavy cache, sharding the write path — show up in most of the eighteen designs that follow.

1 · Clarifying questions

Five minutes. The right questions narrow the design enough to make the math possible and the architecture defensible.

How long are the short codes?	Pick a length first. 7 base62 characters → ~3.5 trillion codes; enough for a Bitly-class system.
Are codes user-customisable?	Custom aliases ("/my-talk") add a uniqueness check on the hot path. Assume opt-in for paid tiers; default is system-generated.
How long do URLs live?	Forever, with optional expiry. Forever changes the storage horizon by orders of magnitude — confirm.
What's the read/write ratio?	~100:1 in practice. Most URLs get clicked many times; a small fraction get clicked millions of times.
Do we need analytics?	Yes — click counts, geo, referrer. Async pipeline; not on the redirect critical path.
Latency budget for the redirect?	P99 ≤ 50 ms end-to-end. The redirect is the user-visible product surface.
Authentication?	Out of scope for the design round; assume an existing identity service for the create-link API.
Multi-region?	Yes for reads (low-latency anycast). Single-region writes initially, with active-active later.

2 · Capacity math, on a napkin

Before any boxes. The numbers that follow drive every later decision — the schema choice, the cache size, the shard count, the cost line.

Number	Calculation	Result
Writes / day	given	100M
Reads / day	100× writes (100:1)	10B
Write QPS (avg / peak)	100M / 86,400 × 3	~1.2K / ~3.5K
Read QPS (avg / peak)	10B / 86,400 × 3	~120K / ~350K
Concurrency, reads	Little: 350K × 0.05 s	~17,500 in-flight
Storage / row	id + URL + meta	~500 B
Storage / year	100M × 500 B × 365	~18 TB
Storage / 10 yr	×10	~180 TB
Cache (hot 10%)	10% × 100M × 500 B / yr	~5 GB / yr → ~50 GB at 10 yr
Egress, redirects	120K avg × ~512 B headers	~5 TB / day

The numbers tell the story before the architecture does. 10B reads/day is "this is a hot, read-heavy KV system". 100M writes/day is "single writer per shard is fine; we don't need a write-side coordinator yet".

Interactive · URL shortener — adjust and watch Drag the sliders, watch the math.

Presets

Write QPS 1.2K

Payload (KB) 0.5 KB

R / W ratio 100:1

P99 (ms) 50 ms

Replication ×3

Hot set % 10%

Horizon 1 yr

Throughput

Write QPS, peak3.6K

Read QPS, peak360.0K

Read concurrency (Little's)18.0K

Write concurrency180

Storage & cache

Raw / day49.4 GB

Raw / horizon17.6 TB

With ×3 replication52.9 TB

Hot-set cache (RAM)1.8 TB

Read egress / day4.8 TB

Formulas. Concurrency = QPS × P99 (Little's law). Storage / day = QPS × 86 400 × payload. Peak factor 3× over avg. Hot-set is the RAM working set you'd need for that read percentage.

3 · API and data model

The wire surface and the row layout. Both are deliberately small.

Endpoints

POST /v1/links # create
{"url":"https://example.com/long/path", "alias":"my-talk"} # alias optional
→ 201 {"code":"aB3x9Q2", "short_url":"https://tk.sh/aB3x9Q2"}

GET /:code # redirect (the hot path)
→ 301 Location: https://example.com/long/path
 Cache-Control: public, max-age=300

GET /v1/links/:code/stats # async-served from analytics store
→ 200 {"clicks":1024,"top_country":"US","top_referrer":"twitter.com"}

DELETE /v1/links/:code # owner only
→ 204

Schema

links
 code VARCHAR(8) PRIMARY KEY -- base62, 7 chars + sentinel
 url TEXT NOT NULL -- target, max 2 KB
 owner_id BIGINT NULL -- FK to users
 alias BOOLEAN NOT NULL -- custom vs system-generated
 created_at TIMESTAMP NOT NULL
 expires_at TIMESTAMP NULL
 deleted BOOLEAN NOT NULL DEFAULT FALSE

 INDEX (owner_id, created_at) -- "my links"
 INDEX (expires_at) WHERE expires_at IS NOT NULL -- expiry sweep

clicks_daily -- analytics, separate store
 code VARCHAR(8)
 date DATE
 country CHAR(2)
 referrer TEXT
 count BIGINT
 PRIMARY KEY (code, date, country, referrer)

The redirect path reads only `code → url`. Everything else lives off the hot path. The clicks table is in a columnar store (ClickHouse / BigQuery) — never queried during a redirect.

4 · High-level architecture

The boxes-and-arrows. The redirect path is hot and read-heavy; the create path is cold and write-heavy. They scale independently.

Three layers. Edge is anycast cache (Cloudflare / Fastly). Region is stateless services. Storage is sharded KV for the hot path, columnar for analytics. The async lane carries click events into ClickHouse via Kafka.

5 · The hard part — generating short codes

The interesting choice. There are four serious answers; each has a failure mode worth knowing.

Strategy	How it works	Strengths	Weaknesses
Counter	Auto-increment integer, base62-encoded	Trivially unique. Compact. No collision check.	Single bottleneck unless sharded. Reveals volume to anyone enumerating codes.
Hash	SHA-256(url + salt), take first 7 base62 chars	Stateless. Same input → same output (good for dedupe, bad for privacy).	Collision rate at 7 chars × 100M URLs ≈ 1 in 35K. Need a check-and-retry path on collision.
Snowflake	64-bit ID: 41 b timestamp + 10 b machine + 12 b seq	Distributed, no coordination. Time-ordered (cache-friendly).	Clock skew breaks uniqueness. 64 b → 11 base62 chars (longer than ideal).
Centralised range allocator	Each writer claims a 10K-id range from Zookeeper / etcd	Compact codes. Survives skew. Any writer can fill in independently.	Coordinator is a SPOF if not done right. Range exhaustion is a separate failure mode.

The recommendation for a fresh design: centralised range allocator. Each redirect-create pod claims a range of 10K IDs; allocates from the in-memory counter; refills when low. Survives the coordinator dying (the in-memory range survives), survives clock skew, and produces compact 7-character codes. The coordinator can be Zookeeper, etcd, or a single Postgres row with a SELECT FOR UPDATE.

Why not hash-only. Hashing the URL feels elegant and avoids coordination, but it makes one URL always map to one code — which means the second user shortening the same URL gets the same code, which leaks information about who else shortened it. Hash + a per-user salt fixes this but reintroduces the storage cost. A counter-derived ID is simpler.

6 · Base62 encoding, in three lines

Convert a 64-bit integer into a 7- to 11-character string over the alphabet [0-9A-Za-z]. The implementation is short enough to fit on the whiteboard.

const ALPHA = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

function encode(n) {
 let out = "";
 do {
 out = ALPHA[n % 62] + out;
 n = Math.floor(n / 62);
 } while (n > 0);
 return out;
}

function decode(s) {
 let n = 0;
 for (const c of s) n = n * 62 + ALPHA.indexOf(c);
 return n;
}

7 characters of base62 is 62⁷ ≈ 3.5 trillion. At 100M new codes/day, that's 96 years before exhaustion. Plenty.

7 · The cache strategy

Reads are 100× writes; the cache earns its keep. Three layers, each catching a different traffic shape.

Edge cache (CDN). 60-second TTL on the redirect response. Catches the burst when a popular link goes viral. Cloudflare / Fastly with anycast. ~80% of read traffic terminates here without ever hitting our infrastructure.
Hot KV cache (Redis cluster). Cache-aside on the redirect path. ~50 GB sharded across 32 nodes; LRU eviction; 95%+ hit rate on the surviving 20% of reads. P99 reach: ~3 ms.
Local in-process cache. A 10K-entry LRU per pod, 30-second TTL, captures the very-hot tail. Avoids round-tripping to Redis for the top 0.1% of links that account for 30% of clicks.

The cache invariant nobody mentions. When a link is updated or deleted, the edge cache, the Redis cache, and every pod's local cache must learn. Use short TTLs (60 s edge, 5 min Redis, 30 s local) plus a Pub/Sub fan-out on the update path. "Eventual" is fine for an analytics product; for a delete, longer invalidation windows are a privacy bug.

8 · Sharding the storage

32 shards by hash(code). Each shard sits on a 3-node replica set with one writer and two read replicas. The shard count is chosen to give headroom for 10× growth without resharding — at 100M writes/day per shard, we have plenty of room.

Decision	Choice	Why
Shard key	hash(code)	Even distribution; no hot shards. `created_at` would create a hotspot on the newest shard.
Shard count	32 (~32 = 2⁵)	Power-of-two simplifies routing. Headroom for 10× growth before a reshard.
Replication	3-way per shard	One writer + two readers in the same region. Async cross-region replicas for read fanout.
Read consistency	Eventual; fall through to primary on cache miss	Click traffic doesn't need linearizability; the create path does, and reads it from primary.
Engine	DynamoDB or Cassandra (managed if available)	The workload is pure KV. Don't pay relational overhead for nothing.

9 · Failure modes & runbook

What dies, what oncall sees, what the runbook says. The interview-room version of the operational story.

Failure	Symptom	Mitigation
One redirect pod crash-loops	P99 redirect latency rises ~5%	K8s restarts; shed traffic via readiness probe; alert at 3 minutes.
Redis shard offline	20× rise in DB load on its key range	Local in-process cache absorbs ~30% of impact; circuit-break to direct-DB; auto-failover Redis (RDB snapshot replica).
One DB shard offline	1/32 of redirects 5xx	Replica failover (~30 s with auto-promote); CDN serves stale-on-error for 5 min while replica catches up.
ID-allocator unreachable	Create-link API errors out	Each pod has 10K-ID buffer; tolerates ~10 minutes of outage. Page on buffer < 1K.
Region offline	1/3 of read traffic dies	Anycast routes around it within 30 s; RTO ≤ 5 min for full failover via DNS.
Hot key (one URL goes viral)	Specific shard at 100% CPU	Edge cache absorbs > 90%; local pod cache the rest. If still hot, dynamically promote to the in-process cache for 1 hour.
Click flood (DDoS on one URL)	Edge cache hit rate at 99%, origin still saturating	WAF rate-limit per source IP; absolute limit per code at 10K rps. Above that, return cached value with no-counting.

10 · Cost & SLOs

A ballpark in 2026 cloud prices. Numbers are right to within 2× — that's enough to justify decisions in a design round.

Line	Estimate	Note
CDN egress	~$1.5K / month	~5 TB / day × $0.01 / GB at scale
Compute (200 redirect + 50 create pods)	~$3K / month	2 vCPU × 4 GB; managed K8s
Redis cluster (50 GB × 3 replicas)	~$2K / month	32 × m6g.large equivalent
DynamoDB (180 TB, 350K avg reads)	~$8K / month	On-demand pricing; reserved drops 40%
ClickHouse (analytics)	~$1K / month	3-node, columnar, ingest from Kafka
Total	~$15K / month	Per million MAU: ~$50/M-MAU

SLOs

Redirect availability: 99.99%. Edge + multi-region make this achievable. Per-quarter error budget: 12 minutes.
Redirect P99 latency: 50 ms. Measured at the edge; alert at P99 > 80 ms for 5 minutes.
Create availability: 99.9%. Tighter budget than redirect because writes are rarer. Per-quarter: ~2 hours.
Analytics freshness: ≤ 60 s. Click counts are eventually consistent; the dashboard makes that explicit.

11 · Trade-offs & "what would you change at 10×"

The interviewer's last 5 minutes. These are the answers that turn a competent design into a compelling one.

If…	Then…
Writes 10×	Reshard to 64. Add a write-side allocator with longer ranges. Move from on-demand DynamoDB to provisioned.
Reads 10×	Edge cache absorbs most of it. Add a regional read-only Redis pool per region. Bump local in-process cache to 100K entries.
Strict SLA on freshness (e.g. delete must propagate < 1 s)	Replace TTL-based invalidation with a Pub/Sub fan-out from the create / delete path; explicit purge to edge.
Multi-region writes	Choose: active-passive (simpler, RPO ≥ 30 s) or active-active with last-writer-wins (riskier, requires CRDT-style conflict resolution).
GDPR / data residency	Pin shards by region. The link table grows tags by region; the edge layer routes to the right region by URL prefix.
"What would a more senior answer add?"	Add a phishing/malware filter on create. Cache poisoning for blocked domains. The honest next-tier answer is the moderation pipeline — that's the part most candidates skip.

Defending the decisions — interview pushback

The URL shortener looks small enough that interviewers feel free to probe every choice. Six exchanges recur — have one-sentence defenses ready.

"Why 7-character Base62?" 62⁷ ≈ 3.5 trillion keys — enough headroom for any realistic scale, and the key stays short. Don't overthink the key space unless the interviewer raises it.
"Why 302 and not 301?" 301 is cached by the browser permanently. Every click after the first bypasses your redirect service entirely — analytics disappear. Use 302 unless you're explicitly trading analytics for browser cache efficiency.
"Why lazy Redis population?" A URL that's never visited doesn't need a cache entry. Populate Redis on first read, not at creation. Keeps the cache hot with actually-popular URLs rather than everything ever created.
"Why CDN in front?" Redirect service and Redis cannot absorb a viral spike — 10M requests in 60 seconds from a celebrity tweet will overwhelm any backend. CDN with Cache-Control: public, max-age=300 is the only mitigation that scales.
"Why Kafka for analytics, not synchronous writes?" Redirect p99 budget is < 10 ms. Writing a click event synchronously adds latency and creates a dependency that can fail. Fire-and-forget to Kafka; the analytics consumer catches up in its own time.
"Why Postgres and not a distributed store?" 100M URLs ≈ 50 GB on a single RDS instance with room to spare. Don't reach for a distributed store until you have a problem that requires one — and at this scale, you don't.

URL shortener

1 · Clarifying questions

2 · Capacity math, on a napkin

3 · API and data model

Endpoints

Schema

4 · High-level architecture

5 · The hard part — generating short codes

6 · Base62 encoding, in three lines

7 · The cache strategy

8 · Sharding the storage

9 · Failure modes & runbook

10 · Cost & SLOs

SLOs

11 · Trade-offs & "what would you change at 10×"

Defending the decisions — interview pushback

Further reading

Pastebin