Design Twitter
A 250M-DAU read-heavy social feed. The hard parts aren't the writes — tweeting is rare and cheap. The hard parts are the timeline (sub-100 ms for billions of users), the celebrity problem (one writer with 100M followers should not crash the pipeline), and ranking (chronological is the easy answer, "what you'd want to see" is the real one).
1 · Clarifying questions
| Functional scope? | Post a tweet (text + media), follow/unfollow, view home timeline, view user profile timeline, search. No DMs, no spaces, no ads. |
| Tweet size? | 280 characters of text, plus up to 4 images (or one video). Text is small; media dominates storage. |
| Scale? | 250M DAU. ~500M tweets/day (~5K writes/sec average, 25K peak). ~25B timeline reads/day (~290K reads/sec, 1.5M peak). Read:write ratio close to 50:1. |
| Latency? | Timeline load: P99 ≤ 200 ms. Tweet post: P99 ≤ 1 s (the user sees their own tweet immediately — fan-out can run async). |
| Ordering? | Mostly reverse-chronological with a ranking layer. Strict "ranked first, time second" is the default. |
| Consistency? | The user always sees their own tweet immediately (session consistency on writes). Other users see new tweets within seconds (eventual). |
| Multi-region? | Yes. Region-local writes with async cross-region replication. Reads served from the closest region. |
2 · Capacity math, on a napkin
| Number | Calculation | Result |
|---|---|---|
| DAU | given | 250M |
| Tweets/day | 250M × 2 per active user | ~500M |
| Tweet write QPS (avg / peak) | 500M / 86,400 × 5 | ~5K / ~25K |
| Timeline read QPS (avg / peak) | 250M × 100 reads / 86,400 × 5 | ~290K / ~1.5M |
| Read:write ratio | — | ~50:1 |
| Tweet payload (text + meta) | ~500 B (280 chars + ids + timestamps) | ~500 B |
| Daily text storage | 500M × 500 B × 3 (repl) | ~700 GB/day → ~250 TB/year |
| Media storage | 30% of tweets × 1.5 MB avg × 3 | ~700 TB/day raw → object storage tier |
| Follow graph edges | 250M × 250 avg follows | ~60B edges, ~5 TB |
| Timeline cache (in-memory) | 250M × 800 recent tweet ids × 8 B | ~1.5 TB across the timeline tier |
| Egress bandwidth (peak) | 1.5M reads × ~20 tweets × ~600 B avg | ~150 Gbps from origin (CDN absorbs most) |
The headline: read traffic is 50× write traffic, so the whole architecture is built to keep reads fast and cheap. Tweets are tiny; media is large; the follow graph is enormous; timelines live in RAM.
3 · API and data model
Public API
POST /v1/tweets # post a tweet
{ "text": "...", "media_ids": [...], "in_reply_to": null }
→ 201 { "tweet_id": "t_8aB3x9Q2", "created_at": "..." }
GET /v1/users/:id/timeline # home timeline (default cursor-paginated)
?cursor=t_8aB3x9Q2&limit=50
→ 200 { "tweets": [...], "next_cursor": "..." }
GET /v1/users/:id/tweets # profile timeline (user's own posts)
POST /v1/users/:id/follow # idempotent
DELETE /v1/users/:id/follow
GET /v1/search?q=...&type=top|latest # searchStorage
tweets -- Manhattan / sharded MySQL by user_id
tweet_id BIGINT PRIMARY KEY -- Snowflake; time-ordered
user_id BIGINT -- author; shard key
text VARCHAR(280)
media_ids JSONB
reply_to BIGINT NULL
retweet_of BIGINT NULL
created_at TIMESTAMP
INDEX (user_id, created_at)
follow_graph -- adjacency list, sharded by follower
follower_id BIGINT
followee_id BIGINT
created_at TIMESTAMP
PRIMARY KEY ((follower_id), followee_id)
home_timeline -- Redis sorted set per user
user:{id}:home → ZSET<tweet_id, score>
(kept to ~800 ids; trimmed on insert)
user_timeline -- Redis sorted set per author
user:{id}:tweets → ZSET<tweet_id, created_at>
(kept to ~3200 ids; older paged from MySQL)4 · High-level architecture
Tweet service writes the tweet to MySQL and emits a Kafka event. Fan-out workers consume the event, look up the author's followers from the graph store, and push the tweet_id onto each follower's home-timeline ZSET in Redis. Timeline service reads from Redis on demand. The dashed line is the lazy path for celebrities — see §5.
5 · The hard part — fan-out and the celebrity problem
This is the decision the room is testing. Three shapes; one wins.
Fan-out on read (pull)
At read time, for each user the timeline service fetches the latest tweets from every followee, merges them, and returns. Beautifully simple. Falls over the moment a user follows 1,000 people — that's 1,000 fan-out queries per timeline read, multiplied by 1.5M reads/sec.
Fan-out on write (push)
At write time, push the new tweet_id into every follower's pre-computed timeline. Reads are then a single Redis ZRANGE — sub-millisecond. The cost moves to the write path: average write fan-out is ~250 (median follower count). Still cheap, because writes are 50× rarer than reads. Until you hit a celebrity.
The celebrity problem
A user with 100M followers tweets once. Fan-out-on-write means 100M Redis writes for that one tweet. The fan-out worker pool stalls, every other user's tweet sits behind it in the queue, the timeline freshness SLO breaks for everyone.
The fix is a hybrid: fan-out-on-write for the long tail (≤100K followers), and fan-out-on-read for celebrities (>100K followers). At read time, the timeline service does the ZRANGE for the precomputed part and also reads the recent tweets directly from the celebrities the user follows, then merges. Slightly more expensive per read, but the load is bounded — a user follows at most a handful of celebrities, so the read-side merge is small.
# Fan-out worker (long tail)
on event tweet_posted(author_id, tweet_id):
followers = graph.get_followers(author_id)
if len(followers) <= CELEBRITY_THRESHOLD:
for f in followers:
redis.zadd(f"user:{f}:home", {tweet_id: tweet_id})
redis.zremrangebyrank(f"user:{f}:home", 0, -801) # trim to 800
else:
# Mark author as celebrity; skip push. Reads will pull.
redis.sadd("celebrities", author_id)
# Timeline read (hybrid merge)
def get_home(user_id, limit=50):
precomputed = redis.zrevrange(f"user:{user_id}:home", 0, limit*2)
celebs_followed = graph.get_followees(user_id) & redis.smembers("celebrities")
recent_celeb_tweets = []
for c in celebs_followed:
recent_celeb_tweets += redis.zrevrange(f"user:{c}:tweets", 0, 20)
merged = merge_by_score(precomputed, recent_celeb_tweets)
return hydrate(merged[:limit])Ranking, briefly
The score in the ZSET is not just timestamp. It's a weighted blend — recency, engagement signals from the author, the user's interaction history with this author, plus a content score from an offline ML pipeline. The ranking service writes the score into the ZSET on insert; reads return things already in the right order. The expensive part is the offline pipeline, not the read path.
6 · Search
Search at this volume is a different system — Earlybird (Twitter's in-house inverted-index engine, Lucene-derived) shards tweets by tweet_id and keeps the last 7 days of tweets in RAM across the cluster. Older tweets live on disk-backed shards, queried only when the user explicitly asks for older results.
- Indexer. Consumes the same Kafka tweet topic; writes one document per tweet to the appropriate shard.
- Query path. Scatter-gather across all shards in the cluster, top-k merge at the coordinator. With 7 days × ~5K tweets/sec = ~3B documents per shard set, each query touches ~50 shards.
- "Top" vs "Latest". Two ranking passes. Latest is sorted by timestamp; Top runs the same ranker as the home timeline against a wider candidate set.
7 · Failure modes & runbook
| Failure | Symptom | Mitigation |
|---|---|---|
| Celebrity threshold mistuned | Fan-out workers backlog grows; timeline freshness SLO breaks | Auto-tune threshold from observed worker queue depth. Pager fires at >90 s lag. |
| Hot user shard | One MySQL shard at 100% CPU; profile timeline reads slow | Split the shard. Read replicas absorb spike in the meantime. Aim for <500 GB per shard. |
| Redis node failure | ~1% of users see empty home timelines briefly | Redis cluster with replication; failover under 30 s. Cold-start rebuilds the ZSET from MySQL on demand. |
| Trending event (election, finals) | Write QPS spikes 10×; fan-out queue backs up | Auto-scale fan-out workers. Temporarily raise celebrity threshold to push more authors to lazy path. Defer ranking pass on the spike. |
| Region partition | Tweets posted in EU not visible in US for some users | Each region has its own MySQL primary; cross-region replication via Kafka. Tolerated lag in the seconds. |
| Search index corruption | Search returns missing results for a window | Replay the Kafka tweet topic into the rebuilding shard. SLO is "indexed within 30 s of post"; corruption recovery measured in hours not days. |
| Spam wave | Bot accounts post millions of tweets per minute | Pre-fan-out spam classifier in the tweet service. Suspect tweets bypass fan-out; held for review. Account-level rate limits at the API. |
8 · Cost & SLOs
| Line | Estimate | Note |
|---|---|---|
| Tweet + timeline services (2K pods) | ~$80K/month | Stateless; auto-scaled |
| Fan-out workers (~500 pods) | ~$20K/month | Bursty; sized for peak |
| MySQL (sharded, ~500 shards) | ~$200K/month | Tweets + reply graph |
| Redis timeline tier (~1.5 TB across cluster) | ~$80K/month | In-memory; the read path lives here |
| Graph store | ~$60K/month | 60B edges; sharded by follower |
| Search cluster (Earlybird-shape) | ~$150K/month | 7 days hot in RAM |
| Object storage + CDN (media) | ~$500K/month | Bandwidth dominates; CDN absorbs ~95% |
| Kafka (multi-region) | ~$80K/month | Tweet topic + fan-out + indexer + ranking |
SLOs
- Timeline read P99: 200 ms. Cache hit on Redis; cold reads slower but rare.
- Tweet post P99: 1 s. Including media upload; fan-out is async.
- Fan-out freshness P99: 5 s. Time from tweet posted to tweet visible in followers' timelines.
- Search freshness P99: 30 s. Time from tweet posted to indexed.
- Availability: 99.95%. ~22 minutes/month of allowable downtime, region-isolated.
9 · Trade-offs & "what would you change at 10×"
| If… | Then… |
|---|---|
| 10× users (2.5B DAU) | Region count grows from 4 to 10+. Timeline cache size becomes painful; consider per-region precompute with shorter retention (200 ids instead of 800). |
| Strict chronological order (no ranking) | Simpler architecture — the ZSET score is just timestamp. Engagement drops 20–30% in published research. |
| Realtime ranking (every read recomputes) | The ranking model has to run inline. Adds 30–50 ms to P99 timeline; lets you personalise on signals from the last second. |
| End-to-end encrypted DMs | Out of scope for tweets but the design is fundamentally different — encrypted at the device, server stores ciphertext only. Search becomes impossible without client-side indexes. |
| "Edit tweet" feature | Tweets become mutable. Fan-out workers re-emit edited versions; downstream caches need invalidation. Easier if you make edits a new tweet that displaces the old in the timeline. |
| "What would a more senior answer add?" | The trust + safety surface: spam classification on the tweet service hot path, the report-and-takedown pipeline, the misinformation flagging pipeline. Plus the data-engineering layer: a Kafka topic of every tweet, fanned out to data warehousing for analytics, search index, ranking model retraining. Treat the data pipeline as a first-class component, not an afterthought. |
Further reading
- Raffi Krikorian — "Timelines at Scale" (Twitter). The canonical 2013 talk that introduced the fan-out hybrid; still the clearest single source.
- Twitter Engineering — "The Infrastructure Behind Twitter". Series of blog posts on Manhattan (their storage system), Earlybird (search), and the timeline service.
- "Reducing search latency at Twitter" — TechCrunch / engineering blog. The Earlybird architecture in detail.
- Adjacent: News feed. The general fan-out pattern; this page is the Twitter-specific version.
- Adjacent: Top-k / trending. The trending-hashtags pipeline.
- Adjacent: Kafka. The event bus underneath the fan-out.
- Adjacent: Napkin math. The capacity math here in detail.