URL shortener
The canonical first system-design question. The mechanics are simple enough that a strong candidate finishes it cleanly in 35 minutes; the depth comes from the trade-offs around ID generation, the cache strategy, and the analytics pipeline. Walk this once, and the patterns it teaches — capacity math, an ID-generation choice, a read-heavy cache, sharding the write path — show up in most of the eighteen designs that follow.
1 · Clarifying questions
Five minutes. The right questions narrow the design enough to make the math possible and the architecture defensible.
| How long are the short codes? | Pick a length first. 7 base62 characters → ~3.5 trillion codes; enough for a Bitly-class system. |
| Are codes user-customisable? | Custom aliases ("/my-talk") add a uniqueness check on the hot path. Assume opt-in for paid tiers; default is system-generated. |
| How long do URLs live? | Forever, with optional expiry. Forever changes the storage horizon by orders of magnitude — confirm. |
| What's the read/write ratio? | ~100:1 in practice. Most URLs get clicked many times; a small fraction get clicked millions of times. |
| Do we need analytics? | Yes — click counts, geo, referrer. Async pipeline; not on the redirect critical path. |
| Latency budget for the redirect? | P99 ≤ 50 ms end-to-end. The redirect is the user-visible product surface. |
| Authentication? | Out of scope for the design round; assume an existing identity service for the create-link API. |
| Multi-region? | Yes for reads (low-latency anycast). Single-region writes initially, with active-active later. |
2 · Capacity math, on a napkin
Before any boxes. The numbers that follow drive every later decision — the schema choice, the cache size, the shard count, the cost line.
| Number | Calculation | Result |
|---|---|---|
| Writes / day | given | 100M |
| Reads / day | 100× writes (100:1) | 10B |
| Write QPS (avg / peak) | 100M / 86,400 × 3 | ~1.2K / ~3.5K |
| Read QPS (avg / peak) | 10B / 86,400 × 3 | ~120K / ~350K |
| Concurrency, reads | Little: 350K × 0.05 s | ~17,500 in-flight |
| Storage / row | id + URL + meta | ~500 B |
| Storage / year | 100M × 500 B × 365 | ~18 TB |
| Storage / 10 yr | ×10 | ~180 TB |
| Cache (hot 10%) | 10% × 100M × 500 B / yr | ~5 GB / yr → ~50 GB at 10 yr |
| Egress, redirects | 120K avg × ~512 B headers | ~5 TB / day |
The numbers tell the story before the architecture does. 10B reads/day is "this is a hot, read-heavy KV system". 100M writes/day is "single writer per shard is fine; we don't need a write-side coordinator yet".
3 · API and data model
The wire surface and the row layout. Both are deliberately small.
Endpoints
POST /v1/links # create
{"url":"https://example.com/long/path", "alias":"my-talk"} # alias optional
→ 201 {"code":"aB3x9Q2", "short_url":"https://tk.sh/aB3x9Q2"}
GET /:code # redirect (the hot path)
→ 301 Location: https://example.com/long/path
Cache-Control: public, max-age=300
GET /v1/links/:code/stats # async-served from analytics store
→ 200 {"clicks":1024,"top_country":"US","top_referrer":"twitter.com"}
DELETE /v1/links/:code # owner only
→ 204Schema
links
code VARCHAR(8) PRIMARY KEY -- base62, 7 chars + sentinel
url TEXT NOT NULL -- target, max 2 KB
owner_id BIGINT NULL -- FK to users
alias BOOLEAN NOT NULL -- custom vs system-generated
created_at TIMESTAMP NOT NULL
expires_at TIMESTAMP NULL
deleted BOOLEAN NOT NULL DEFAULT FALSE
INDEX (owner_id, created_at) -- "my links"
INDEX (expires_at) WHERE expires_at IS NOT NULL -- expiry sweep
clicks_daily -- analytics, separate store
code VARCHAR(8)
date DATE
country CHAR(2)
referrer TEXT
count BIGINT
PRIMARY KEY (code, date, country, referrer)The redirect path reads only `code → url`. Everything else lives off the hot path. The clicks table is in a columnar store (ClickHouse / BigQuery) — never queried during a redirect.
4 · High-level architecture
The boxes-and-arrows. The redirect path is hot and read-heavy; the create path is cold and write-heavy. They scale independently.
Three layers. Edge is anycast cache (Cloudflare / Fastly). Region is stateless services. Storage is sharded KV for the hot path, columnar for analytics. The async lane carries click events into ClickHouse via Kafka.
5 · The hard part — generating short codes
The interesting choice. There are four serious answers; each has a failure mode worth knowing.
| Strategy | How it works | Strengths | Weaknesses |
|---|---|---|---|
| Counter | Auto-increment integer, base62-encoded | Trivially unique. Compact. No collision check. | Single bottleneck unless sharded. Reveals volume to anyone enumerating codes. |
| Hash | SHA-256(url + salt), take first 7 base62 chars | Stateless. Same input → same output (good for dedupe, bad for privacy). | Collision rate at 7 chars × 100M URLs ≈ 1 in 35K. Need a check-and-retry path on collision. |
| Snowflake | 64-bit ID: 41 b timestamp + 10 b machine + 12 b seq | Distributed, no coordination. Time-ordered (cache-friendly). | Clock skew breaks uniqueness. 64 b → 11 base62 chars (longer than ideal). |
| Centralised range allocator | Each writer claims a 10K-id range from Zookeeper / etcd | Compact codes. Survives skew. Any writer can fill in independently. | Coordinator is a SPOF if not done right. Range exhaustion is a separate failure mode. |
The recommendation for a fresh design: centralised range allocator. Each redirect-create pod claims a range of 10K IDs; allocates from the in-memory counter; refills when low. Survives the coordinator dying (the in-memory range survives), survives clock skew, and produces compact 7-character codes. The coordinator can be Zookeeper, etcd, or a single Postgres row with a SELECT FOR UPDATE.
6 · Base62 encoding, in three lines
Convert a 64-bit integer into a 7- to 11-character string over the alphabet
[0-9A-Za-z]. The implementation is short enough to fit on the whiteboard.
const ALPHA = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
function encode(n) {
let out = "";
do {
out = ALPHA[n % 62] + out;
n = Math.floor(n / 62);
} while (n > 0);
return out;
}
function decode(s) {
let n = 0;
for (const c of s) n = n * 62 + ALPHA.indexOf(c);
return n;
}7 characters of base62 is 627 ≈ 3.5 trillion. At 100M new codes/day, that's 96 years before exhaustion. Plenty.
7 · The cache strategy
Reads are 100× writes; the cache earns its keep. Three layers, each catching a different traffic shape.
- Edge cache (CDN). 60-second TTL on the redirect response. Catches the burst when a popular link goes viral. Cloudflare / Fastly with anycast. ~80% of read traffic terminates here without ever hitting our infrastructure.
- Hot KV cache (Redis cluster). Cache-aside on the redirect path. ~50 GB sharded across 32 nodes; LRU eviction; 95%+ hit rate on the surviving 20% of reads. P99 reach: ~3 ms.
- Local in-process cache. A 10K-entry LRU per pod, 30-second TTL, captures the very-hot tail. Avoids round-tripping to Redis for the top 0.1% of links that account for 30% of clicks.
8 · Sharding the storage
32 shards by hash(code). Each shard sits on a 3-node replica set with
one writer and two read replicas. The shard count is chosen to give headroom for 10×
growth without resharding — at 100M writes/day per shard, we have plenty of room.
| Decision | Choice | Why |
|---|---|---|
| Shard key | hash(code) | Even distribution; no hot shards. created_at would create a hotspot on the newest shard. |
| Shard count | 32 (~32 = 25) | Power-of-two simplifies routing. Headroom for 10× growth before a reshard. |
| Replication | 3-way per shard | One writer + two readers in the same region. Async cross-region replicas for read fanout. |
| Read consistency | Eventual; fall through to primary on cache miss | Click traffic doesn't need linearizability; the create path does, and reads it from primary. |
| Engine | DynamoDB or Cassandra (managed if available) | The workload is pure KV. Don't pay relational overhead for nothing. |
9 · Failure modes & runbook
What dies, what oncall sees, what the runbook says. The interview-room version of the operational story.
| Failure | Symptom | Mitigation |
|---|---|---|
| One redirect pod crash-loops | P99 redirect latency rises ~5% | K8s restarts; shed traffic via readiness probe; alert at 3 minutes. |
| Redis shard offline | 20× rise in DB load on its key range | Local in-process cache absorbs ~30% of impact; circuit-break to direct-DB; auto-failover Redis (RDB snapshot replica). |
| One DB shard offline | 1/32 of redirects 5xx | Replica failover (~30 s with auto-promote); CDN serves stale-on-error for 5 min while replica catches up. |
| ID-allocator unreachable | Create-link API errors out | Each pod has 10K-ID buffer; tolerates ~10 minutes of outage. Page on buffer < 1K. |
| Region offline | 1/3 of read traffic dies | Anycast routes around it within 30 s; RTO ≤ 5 min for full failover via DNS. |
| Hot key (one URL goes viral) | Specific shard at 100% CPU | Edge cache absorbs > 90%; local pod cache the rest. If still hot, dynamically promote to the in-process cache for 1 hour. |
| Click flood (DDoS on one URL) | Edge cache hit rate at 99%, origin still saturating | WAF rate-limit per source IP; absolute limit per code at 10K rps. Above that, return cached value with no-counting. |
10 · Cost & SLOs
A ballpark in 2026 cloud prices. Numbers are right to within 2× — that's enough to justify decisions in a design round.
| Line | Estimate | Note |
|---|---|---|
| CDN egress | ~$1.5K / month | ~5 TB / day × $0.01 / GB at scale |
| Compute (200 redirect + 50 create pods) | ~$3K / month | 2 vCPU × 4 GB; managed K8s |
| Redis cluster (50 GB × 3 replicas) | ~$2K / month | 32 × m6g.large equivalent |
| DynamoDB (180 TB, 350K avg reads) | ~$8K / month | On-demand pricing; reserved drops 40% |
| ClickHouse (analytics) | ~$1K / month | 3-node, columnar, ingest from Kafka |
| Total | ~$15K / month | Per million MAU: ~$50/M-MAU |
SLOs
- Redirect availability: 99.99%. Edge + multi-region make this achievable. Per-quarter error budget: 12 minutes.
- Redirect P99 latency: 50 ms. Measured at the edge; alert at P99 > 80 ms for 5 minutes.
- Create availability: 99.9%. Tighter budget than redirect because writes are rarer. Per-quarter: ~2 hours.
- Analytics freshness: ≤ 60 s. Click counts are eventually consistent; the dashboard makes that explicit.
11 · Trade-offs & "what would you change at 10×"
The interviewer's last 5 minutes. These are the answers that turn a competent design into a compelling one.
| If… | Then… |
|---|---|
| Writes 10× | Reshard to 64. Add a write-side allocator with longer ranges. Move from on-demand DynamoDB to provisioned. |
| Reads 10× | Edge cache absorbs most of it. Add a regional read-only Redis pool per region. Bump local in-process cache to 100K entries. |
| Strict SLA on freshness (e.g. delete must propagate < 1 s) | Replace TTL-based invalidation with a Pub/Sub fan-out from the create / delete path; explicit purge to edge. |
| Multi-region writes | Choose: active-passive (simpler, RPO ≥ 30 s) or active-active with last-writer-wins (riskier, requires CRDT-style conflict resolution). |
| GDPR / data residency | Pin shards by region. The link table grows tags by region; the edge layer routes to the right region by URL prefix. |
| "What would a more senior answer add?" | Add a phishing/malware filter on create. Cache poisoning for blocked domains. The honest next-tier answer is the moderation pipeline — that's the part most candidates skip. |
Defending the decisions — interview pushback
The URL shortener looks small enough that interviewers feel free to probe every choice. Six exchanges recur — have one-sentence defenses ready.
- "Why 7-character Base62?" 62⁷ ≈ 3.5 trillion keys — enough headroom for any realistic scale, and the key stays short. Don't overthink the key space unless the interviewer raises it.
- "Why 302 and not 301?" 301 is cached by the browser permanently. Every click after the first bypasses your redirect service entirely — analytics disappear. Use 302 unless you're explicitly trading analytics for browser cache efficiency.
- "Why lazy Redis population?" A URL that's never visited doesn't need a cache entry. Populate Redis on first read, not at creation. Keeps the cache hot with actually-popular URLs rather than everything ever created.
- "Why CDN in front?" Redirect service and Redis cannot absorb a viral spike — 10M requests in 60 seconds from a celebrity tweet will overwhelm any backend. CDN with
Cache-Control: public, max-age=300is the only mitigation that scales. - "Why Kafka for analytics, not synchronous writes?" Redirect p99 budget is < 10 ms. Writing a click event synchronously adds latency and creates a dependency that can fail. Fire-and-forget to Kafka; the analytics consumer catches up in its own time.
- "Why Postgres and not a distributed store?" 100M URLs ≈ 50 GB on a single RDS instance with room to spare. Don't reach for a distributed store until you have a problem that requires one — and at this scale, you don't.
Further reading
- Bitly Engineering — "Skiplist-based Redirect Database". The closest public writeup of a real URL shortener at scale. Skim for storage-engine choices.
- Stripe — "Idempotency Keys at Stripe". Not URL-shortener-specific, but the canonical reference for designing idempotent create APIs.
- Snowflake (Twitter, 2010). The original distributed-ID paper. Worth reading once before defending Snowflake in a design round.
- Marc Brooker — "Exponential Backoff and Jitter". The retry-with-jitter pattern that keeps the create path from synchronising under load.
- Uber Engineering — "Schemaless". A different shape (Uber's KV-on-MySQL) but useful as a counter-example to "just use Cassandra".
- Adjacent: Consistent hashing. The shard router under the hood.
- Adjacent: Distributed IDs. A standalone walkthrough of Snowflake, ULID, and friends.
- Adjacent: DNS. Anycast routing — how the edge POPs find the user.