15 / 19
Playbook / 15

Design Twitter

A 250M-DAU read-heavy social feed. The hard parts aren't the writes — tweeting is rare and cheap. The hard parts are the timeline (sub-100 ms for billions of users), the celebrity problem (one writer with 100M followers should not crash the pipeline), and ranking (chronological is the easy answer, "what you'd want to see" is the real one).


1 · Clarifying questions

Functional scope?Post a tweet (text + media), follow/unfollow, view home timeline, view user profile timeline, search. No DMs, no spaces, no ads.
Tweet size?280 characters of text, plus up to 4 images (or one video). Text is small; media dominates storage.
Scale?250M DAU. ~500M tweets/day (~5K writes/sec average, 25K peak). ~25B timeline reads/day (~290K reads/sec, 1.5M peak). Read:write ratio close to 50:1.
Latency?Timeline load: P99 ≤ 200 ms. Tweet post: P99 ≤ 1 s (the user sees their own tweet immediately — fan-out can run async).
Ordering?Mostly reverse-chronological with a ranking layer. Strict "ranked first, time second" is the default.
Consistency?The user always sees their own tweet immediately (session consistency on writes). Other users see new tweets within seconds (eventual).
Multi-region?Yes. Region-local writes with async cross-region replication. Reads served from the closest region.

2 · Capacity math, on a napkin

NumberCalculationResult
DAUgiven250M
Tweets/day250M × 2 per active user~500M
Tweet write QPS (avg / peak)500M / 86,400 × 5~5K / ~25K
Timeline read QPS (avg / peak)250M × 100 reads / 86,400 × 5~290K / ~1.5M
Read:write ratio~50:1
Tweet payload (text + meta)~500 B (280 chars + ids + timestamps)~500 B
Daily text storage500M × 500 B × 3 (repl)~700 GB/day → ~250 TB/year
Media storage30% of tweets × 1.5 MB avg × 3~700 TB/day raw → object storage tier
Follow graph edges250M × 250 avg follows~60B edges, ~5 TB
Timeline cache (in-memory)250M × 800 recent tweet ids × 8 B~1.5 TB across the timeline tier
Egress bandwidth (peak)1.5M reads × ~20 tweets × ~600 B avg~150 Gbps from origin (CDN absorbs most)

The headline: read traffic is 50× write traffic, so the whole architecture is built to keep reads fast and cheap. Tweets are tiny; media is large; the follow graph is enormous; timelines live in RAM.

3 · API and data model

Public API

POST /v1/tweets                       # post a tweet
{ "text": "...", "media_ids": [...], "in_reply_to": null }
→ 201 { "tweet_id": "t_8aB3x9Q2", "created_at": "..." }

GET  /v1/users/:id/timeline            # home timeline (default cursor-paginated)
  ?cursor=t_8aB3x9Q2&limit=50
→ 200 { "tweets": [...], "next_cursor": "..." }

GET  /v1/users/:id/tweets              # profile timeline (user's own posts)
POST /v1/users/:id/follow              # idempotent
DELETE /v1/users/:id/follow

GET  /v1/search?q=...&type=top|latest  # search

Storage

tweets                              -- Manhattan / sharded MySQL by user_id
  tweet_id     BIGINT PRIMARY KEY  -- Snowflake; time-ordered
  user_id      BIGINT              -- author; shard key
  text         VARCHAR(280)
  media_ids    JSONB
  reply_to     BIGINT NULL
  retweet_of   BIGINT NULL
  created_at   TIMESTAMP
  INDEX (user_id, created_at)

follow_graph                        -- adjacency list, sharded by follower
  follower_id  BIGINT
  followee_id  BIGINT
  created_at   TIMESTAMP
  PRIMARY KEY ((follower_id), followee_id)

home_timeline                       -- Redis sorted set per user
  user:{id}:home → ZSET<tweet_id, score>
                   (kept to ~800 ids; trimmed on insert)

user_timeline                       -- Redis sorted set per author
  user:{id}:tweets → ZSET<tweet_id, created_at>
                     (kept to ~3200 ids; older paged from MySQL)

4 · High-level architecture

Tweet service writes the tweet to MySQL and emits a Kafka event. Fan-out workers consume the event, look up the author's followers from the graph store, and push the tweet_id onto each follower's home-timeline ZSET in Redis. Timeline service reads from Redis on demand. The dashed line is the lazy path for celebrities — see §5.

5 · The hard part — fan-out and the celebrity problem

This is the decision the room is testing. Three shapes; one wins.

Fan-out on read (pull)

At read time, for each user the timeline service fetches the latest tweets from every followee, merges them, and returns. Beautifully simple. Falls over the moment a user follows 1,000 people — that's 1,000 fan-out queries per timeline read, multiplied by 1.5M reads/sec.

Fan-out on write (push)

At write time, push the new tweet_id into every follower's pre-computed timeline. Reads are then a single Redis ZRANGE — sub-millisecond. The cost moves to the write path: average write fan-out is ~250 (median follower count). Still cheap, because writes are 50× rarer than reads. Until you hit a celebrity.

The celebrity problem

A user with 100M followers tweets once. Fan-out-on-write means 100M Redis writes for that one tweet. The fan-out worker pool stalls, every other user's tweet sits behind it in the queue, the timeline freshness SLO breaks for everyone.

The fix is a hybrid: fan-out-on-write for the long tail (≤100K followers), and fan-out-on-read for celebrities (>100K followers). At read time, the timeline service does the ZRANGE for the precomputed part and also reads the recent tweets directly from the celebrities the user follows, then merges. Slightly more expensive per read, but the load is bounded — a user follows at most a handful of celebrities, so the read-side merge is small.

# Fan-out worker (long tail)
on event tweet_posted(author_id, tweet_id):
  followers = graph.get_followers(author_id)
  if len(followers) <= CELEBRITY_THRESHOLD:
    for f in followers:
      redis.zadd(f"user:{f}:home", {tweet_id: tweet_id})
      redis.zremrangebyrank(f"user:{f}:home", 0, -801)  # trim to 800
  else:
    # Mark author as celebrity; skip push. Reads will pull.
    redis.sadd("celebrities", author_id)

# Timeline read (hybrid merge)
def get_home(user_id, limit=50):
  precomputed = redis.zrevrange(f"user:{user_id}:home", 0, limit*2)
  celebs_followed = graph.get_followees(user_id) & redis.smembers("celebrities")
  recent_celeb_tweets = []
  for c in celebs_followed:
    recent_celeb_tweets += redis.zrevrange(f"user:{c}:tweets", 0, 20)
  merged = merge_by_score(precomputed, recent_celeb_tweets)
  return hydrate(merged[:limit])

Ranking, briefly

The score in the ZSET is not just timestamp. It's a weighted blend — recency, engagement signals from the author, the user's interaction history with this author, plus a content score from an offline ML pipeline. The ranking service writes the score into the ZSET on insert; reads return things already in the right order. The expensive part is the offline pipeline, not the read path.

6 · Search

Search at this volume is a different system — Earlybird (Twitter's in-house inverted-index engine, Lucene-derived) shards tweets by tweet_id and keeps the last 7 days of tweets in RAM across the cluster. Older tweets live on disk-backed shards, queried only when the user explicitly asks for older results.

  • Indexer. Consumes the same Kafka tweet topic; writes one document per tweet to the appropriate shard.
  • Query path. Scatter-gather across all shards in the cluster, top-k merge at the coordinator. With 7 days × ~5K tweets/sec = ~3B documents per shard set, each query touches ~50 shards.
  • "Top" vs "Latest". Two ranking passes. Latest is sorted by timestamp; Top runs the same ranker as the home timeline against a wider candidate set.

7 · Failure modes & runbook

FailureSymptomMitigation
Celebrity threshold mistunedFan-out workers backlog grows; timeline freshness SLO breaksAuto-tune threshold from observed worker queue depth. Pager fires at >90 s lag.
Hot user shardOne MySQL shard at 100% CPU; profile timeline reads slowSplit the shard. Read replicas absorb spike in the meantime. Aim for <500 GB per shard.
Redis node failure~1% of users see empty home timelines brieflyRedis cluster with replication; failover under 30 s. Cold-start rebuilds the ZSET from MySQL on demand.
Trending event (election, finals)Write QPS spikes 10×; fan-out queue backs upAuto-scale fan-out workers. Temporarily raise celebrity threshold to push more authors to lazy path. Defer ranking pass on the spike.
Region partitionTweets posted in EU not visible in US for some usersEach region has its own MySQL primary; cross-region replication via Kafka. Tolerated lag in the seconds.
Search index corruptionSearch returns missing results for a windowReplay the Kafka tweet topic into the rebuilding shard. SLO is "indexed within 30 s of post"; corruption recovery measured in hours not days.
Spam waveBot accounts post millions of tweets per minutePre-fan-out spam classifier in the tweet service. Suspect tweets bypass fan-out; held for review. Account-level rate limits at the API.

8 · Cost & SLOs

LineEstimateNote
Tweet + timeline services (2K pods)~$80K/monthStateless; auto-scaled
Fan-out workers (~500 pods)~$20K/monthBursty; sized for peak
MySQL (sharded, ~500 shards)~$200K/monthTweets + reply graph
Redis timeline tier (~1.5 TB across cluster)~$80K/monthIn-memory; the read path lives here
Graph store~$60K/month60B edges; sharded by follower
Search cluster (Earlybird-shape)~$150K/month7 days hot in RAM
Object storage + CDN (media)~$500K/monthBandwidth dominates; CDN absorbs ~95%
Kafka (multi-region)~$80K/monthTweet topic + fan-out + indexer + ranking

SLOs

  • Timeline read P99: 200 ms. Cache hit on Redis; cold reads slower but rare.
  • Tweet post P99: 1 s. Including media upload; fan-out is async.
  • Fan-out freshness P99: 5 s. Time from tweet posted to tweet visible in followers' timelines.
  • Search freshness P99: 30 s. Time from tweet posted to indexed.
  • Availability: 99.95%. ~22 minutes/month of allowable downtime, region-isolated.

9 · Trade-offs & "what would you change at 10×"

If…Then…
10× users (2.5B DAU)Region count grows from 4 to 10+. Timeline cache size becomes painful; consider per-region precompute with shorter retention (200 ids instead of 800).
Strict chronological order (no ranking)Simpler architecture — the ZSET score is just timestamp. Engagement drops 20–30% in published research.
Realtime ranking (every read recomputes)The ranking model has to run inline. Adds 30–50 ms to P99 timeline; lets you personalise on signals from the last second.
End-to-end encrypted DMsOut of scope for tweets but the design is fundamentally different — encrypted at the device, server stores ciphertext only. Search becomes impossible without client-side indexes.
"Edit tweet" featureTweets become mutable. Fan-out workers re-emit edited versions; downstream caches need invalidation. Easier if you make edits a new tweet that displaces the old in the timeline.
"What would a more senior answer add?"The trust + safety surface: spam classification on the tweet service hot path, the report-and-takedown pipeline, the misinformation flagging pipeline. Plus the data-engineering layer: a Kafka topic of every tweet, fanned out to data warehousing for analytics, search index, ranking model retraining. Treat the data pipeline as a first-class component, not an afterthought.

Further reading

  • Raffi Krikorian — "Timelines at Scale" (Twitter). The canonical 2013 talk that introduced the fan-out hybrid; still the clearest single source.
  • Twitter Engineering — "The Infrastructure Behind Twitter". Series of blog posts on Manhattan (their storage system), Earlybird (search), and the timeline service.
  • "Reducing search latency at Twitter" — TechCrunch / engineering blog. The Earlybird architecture in detail.
  • Adjacent: News feed. The general fan-out pattern; this page is the Twitter-specific version.
  • Adjacent: Top-k / trending. The trending-hashtags pipeline.
  • Adjacent: Kafka. The event bus underneath the fan-out.
  • Adjacent: Napkin math. The capacity math here in detail.
Found this useful?