04 / 19
Playbook / 04

A news feed

Twitter, Threads, Instagram, LinkedIn — the same shape. A user follows other users; a new post fans out to those followers' feeds. The interesting choice is when that fan-out happens. Push at write time, pull at read time, or a hybrid keyed on who the author is. The celebrity problem — one user with 100M followers — collapses any pure approach. This is the design that teaches fan-out trade-offs better than any other in the playbook.


1 · Clarifying questions

User scale?500M MAU, 100M DAU. Average user follows 200 accounts; long tail follows 5,000.
Post scale?200M posts/day. Read > 50× write — feed is the dominant load.
Ranking?Chronological by default; ML-ranked for "For You" tab. Both surfaces matter; we'll design the chronological feed and note where ranking plugs in.
Posts include media?Yes — text + 0–4 images/videos. Media handled by separate object-storage subsystem; the feed only stores references.
Latency budget?P99 feed-read ≤ 200 ms. P99 post-create ≤ 500 ms (the fan-out hides behind it).
Freshness target?New post visible to followers in < 5 s P99 (chronological). Less critical for ML-ranked feeds.
Celebrity threshold?Anyone with > 1M followers. ~10K accounts. They get pull-on-read; everyone else gets push-on-write.
Multi-region?Yes. Active-active reads, region-pinned writes for now.

2 · Capacity math, on a napkin

NumberCalculationResult
Posts / daygiven200M
Post create QPS (avg / peak)200M / 86,400 × 3~2.3K / ~7K
Feed reads / day100M DAU × 30 sessions × 1 page~3B
Feed read QPS (peak)3B / 86,400 × 3~100K
Avg follower countgiven200
Naive fan-out write amplification200M × 20040B feed writes / day → ~460K QPS avg
Celebrity write amplification10K × 1M / 200M~5% of posts × 5,000× amp = swamps everything
Storage / post~1 KB metadata + media-pointer~1 KB
Storage / year (posts)200M × 1 KB × 365~73 TB
Feed cache (hot 10% users)10M users × 200 posts × 100 B~200 GB

The 40B writes/day is the headline number. Fan-out-on-write is feasible — but only for non-celebrities. The celebrity column is where the design breaks if you're not careful.

3 · API and data model

Endpoints

POST /v1/posts # create a post
{
 "text": "...",
 "media_ids": ["img-abc", "vid-def"] # already uploaded
}
→ 201 {"post_id":"p_aB3x9Q2", "created_at":"2026-05-09T..."}

GET /v1/feed?cursor=<opaque> # read feed (chronological)
→ 200 {
 "posts": [{ post_id, author_id, text, media, created_at, ... }, ...],
 "next_cursor": "<opaque>"
}

POST /v1/follow # follow another user
{"target_user_id": "u_bob"}
→ 204

DELETE /v1/follow/{target_user_id} # unfollow
→ 204

Schemas

posts -- the source of truth, sharded by post_id
 post_id BIGINT (Snowflake) PRIMARY KEY -- time-ordered ID
 author_id BIGINT NOT NULL
 text TEXT
 media JSONB -- references to media-svc
 created_at TIMESTAMP NOT NULL
 reply_to BIGINT NULL -- for threading
 deleted BOOLEAN NOT NULL DEFAULT FALSE
 INDEX (author_id, created_at) -- "user's posts"

follows -- sharded by follower_id
 follower_id BIGINT
 followee_id BIGINT
 created_at TIMESTAMP
 PRIMARY KEY (follower_id, followee_id)
 INDEX (followee_id, follower_id) -- reverse lookup for fan-out

feeds -- one row per user, chronological list
 user_id BIGINT PRIMARY KEY
 posts LIST<post_id> -- last 1,000; older trimmed
 last_synced TIMESTAMP

celebrity_posts -- the pull-side index for big accounts
 author_id BIGINT
 post_id BIGINT
 created_at TIMESTAMP
 PRIMARY KEY ((author_id), created_at DESC, post_id)
 -- per-author, time-ordered

4 · High-level architecture

Two paths through the system. Write path: post svc → posts KV → fan-out queue → workers → write to each follower's feed. Read path: feed svc → feed cache → merge with celebrity-pull results → optional ranker → return.

5 · The hard part — fan-out strategy

Three approaches. Each has a regime where it wins.

StrategyHow it worksWins whenLoses when
Push (fan-out on write) On post creation, write the post_id into every follower's feed list. Few followers per author, read-heavy. Celebrities — one post = millions of writes. Storage and tail-latency disaster.
Pull (fan-out on read) On feed read, query "give me posts from people I follow, ordered by time". Many followers per author, write-heavy. Read tail latency: scatter-gather across hundreds of accounts. Cold reads are expensive.
Hybrid Push for non-celebrities; pull for celebrities. Merge at read time. The real world: skewed follower distributions. More moving parts. Two code paths to maintain. Worth it.

The hybrid algorithm

On POST(author=A, post=p):
 if A.follower_count > CELEBRITY_THRESHOLD (1M):
 INSERT INTO celebrity_posts (author=A, post=p)
 else:
 publish (A, p) to fan-out queue
 workers consume → for each follower F of A:
 LPUSH feeds(F).posts = p (cap at 1000)

On READ_FEED(user=U):
 base = feeds(U).posts -- already-fan-out'd posts
 cels = []
 for each celebrity C that U follows:
 cels += celebrity_posts(C).slice(since=U.last_seen) -- pull
 merged = sort(base + cels) by created_at DESC
 optionally → ranker.rerank(merged)
 return first N
The threshold isn't a magic number. Celebrities are roughly the top 0.002% of users by follower count. Tune the threshold so the celebrity-pull cost equals the push-on-write cost — typically a few hundred thousand to a few million followers depending on read/write ratios. Track which side dominates and adjust quarterly.

6 · Caching

  • Post cache. The post body keyed by post_id in Redis. Hit rate > 99% during traffic peaks. ~24-hour TTL with explicit invalidation on edit/delete.
  • Feed cache. Each user's most-recent 200 posts pre-merged and ranked. Refreshed on read if stale (> 60 s old) or on push from fan-out workers. ~80% of feed reads serve from this cache without any database hit.
  • Celebrity cache. The last 200 posts per celebrity in a Redis sorted set. Read by every follower's feed-read; high TTL (5 min).
  • User-graph cache. "Who does U follow" — small, hot, and invalidated only on follow/unfollow. Often pinned in process-local cache for the top 1% of users.

7 · Ranking, dedupe, freshness

Ranking

Chronological is the default. The "For You" surface plugs in a ranker after the merge — typically a two-stage system:

  1. Candidate generation. The merged feed (fan-out + celebrity pull + injected ads + suggested follows) is the candidate pool. ~1,000 candidates.
  2. Scoring. A lightweight ML model scores each candidate (engagement probability, freshness decay, source diversity). Run inline at feed-read time.
  3. Heavy reranking. Top ~200 go through a heavier model (transformer) before being returned.

Dedupe

A retweet of a post the user has already seen is a dedupe target. Track a per-user "seen post" set (Bloom filter, 1% FP rate, 100 KB per user) — cheap and avoids the worst feed-thrash. Wipe weekly.

Freshness

Time-decay in the ranker: score = engagement / (age_minutes + 120)1.8. This is approximately the Hacker News rank formula. Tune the exponent so the feed isn't dominated by one-hour-old viral posts.

8 · Failure modes & runbook

FailureSymptomMitigation
Fan-out worker pool falls behindNew posts visible to followers minutes lateAuto-scale workers on queue depth; SLO alarm at > 30 s lag.
Celebrity post traffic spikeCelebrity cache shard at 100% CPUPromote the celebrity to a per-pod local cache; pre-fetch on follow event.
One feed shard down1/N of users see "feed unavailable"Replica failover; serve stale feed from previous-cursor cache for 5 min.
ML ranker down"For You" tab degrades to chronologicalGraceful degradation: chronological is always the floor. Alert; non-page severity.
Massive follow burst (a celebrity joined)Follow-graph DB writes saturated; fan-out queue growsRate-limit follow operations; pre-classify the new account as celebrity if follower count crosses threshold mid-flight.
Stale feeds across regionsUser in Region B doesn't see post made in Region A for > 30 sCross-region replication monitor; run probe writes; alert if cross-region lag > 5 s P99.
Bad post caused viral mass-blockSpike in delete + report; downstream caches stalePub/Sub fan-out to all caches on delete. Bloom filter for tombstones at the read path.

9 · Cost & SLOs

LineEstimateNote
Posts KV (sharded, 73 TB / yr → 700 TB / 10 yr × 3 replicas)~$50K / monthCassandra-class storage; tiered after 90 d
Feed KV (200 GB hot, 2 TB cold)~$10K / monthRedis cluster + RocksDB tier
Fan-out queue (Kafka, 460K msg/s avg)~$8K / month3 brokers per AZ × 3 AZs
Workers (200 pods)~$5K / monthAuto-scaling on queue depth
Feed-read svc (300 pods)~$8K / monthStateless, scales linearly
ML inference (CPU)~$15K / month1k QPS lighter ranker; heavier rerank for top engagement
Cross-region transfer~$10K / monthAsync replication
Total~$106K / month~$0.001 / DAU-month — order of magnitude only

SLOs

  • Feed read availability: 99.99%. Multi-region read; fall back to chronological if ranker fails.
  • Feed read P99: 200 ms. Cache-served target; degraded path is < 500 ms.
  • Post create availability: 99.95%. Tighter than feed read — losing a post is a product severity.
  • Post visibility latency: P99 < 5 s from create to visible in followers' feeds (chronological).

10 · Trade-offs & "what would you change at 10×"

If…Then…
10× users (5B MAU)Region-pin user data; multi-cluster Kafka; the fan-out worker pool becomes a data-plane in itself.
Pure-ML feed (no chronological)The fan-out queue becomes a candidate-event firehose. Per-user offline + online candidate generators. Ranker becomes the dominant cost.
Strict cross-region consistencyDifferent design — accept higher latency, route writes to a leader region. The trade-off is post-create latency for global ordering.
Real-time-ranked feed (no batching)Streaming ML pipeline with per-user vectors materialised on every write. Order-of-magnitude more inference cost.
Group / community feedsThe fan-out target list grows: "post → followers + group members". Materialise group-level feeds; same hybrid trick for big groups.
"What would a more senior answer add?"The integrity layer — spam classification, harassment auto-moderation, the regulatory pipeline (DSA in EU, KOSA in US). Most this designs skip this; it's a meaningful fraction of the engineering at any real social platform.

Defending the decisions — interview pushback

The feed question is really six choices in a trench coat. Each draws probing — have a one-sentence answer ready and the rest of the interview goes faster.

  1. "Fan-out on write for the common case — why?" Most users don't have 100k followers. For them, fan-out on write is simpler, faster to read, and the write amplification is manageable.
  2. "Fan-out on read for celebrities — at what threshold?" 100k to 1M followers is where the math breaks. Pick a number, commit to it, say you'd instrument and tune it. A "celebrity merger" pulls recent posts at feed-load and interleaves them with the pre-computed timeline.
  3. "Why cap the pre-computed timeline at ~800 entries?" A Redis sorted set with 800 posts per user is the hot cache. Older posts come from durable storage on demand. LRU eviction handles the rest. Holding everything is unnecessary; holding the hot prefix is enough.
  4. "Why cursor pagination, never offset?" Offset breaks when posts are inserted between page loads — the user sees duplicates or skips. Cursor is stable across concurrent writes because it's a monotonic key, not a position count.
  5. "Why eventually consistent counters?" Like counts are read by everyone, written by anyone, displayed within milliseconds, and absolutely cannot be exact. Redis HINCRBY for in-flight, periodic sync to durable storage. Nobody can tell 14,823 from 14,824.
  6. "Why is ranking separate from feed delivery?" Different SLOs, different failure modes. Feed delivery should not block on an ML model. Ranking is a candidate-set re-ordering step, not part of the retrieval path.

Further reading

  • Krikorian (Twitter, 2013) — "Timelines at Scale". The talk that introduced "fan-out on write with celebrity pull" to the wider industry. 30 minutes; required watching.
  • Pinterest Engineering — "Smart Feed: building scalable timelines". A cleaner public writeup of the merge-then-rank pipeline.
  • LinkedIn Engineering — "FollowFeed: LinkedIn's Feed Made Faster and Smarter". The infra view from a network with very different follow-distribution dynamics.
  • Meta Engineering — "Scaling the Pre-Computed Photo Feed Service". The Instagram side. Different content type, same pattern.
  • Stack — "How We Designed Our Feed and Reached 100M Users". Useful as a counterpoint at smaller scale.
  • Adjacent: URL shortener. Same hot-KV read pattern at smaller payload.
  • Adjacent: Replication. The cross-region story.
  • Adjacent: Distributed IDs. Snowflake gives time-ordered post IDs that simplify feed merging.
Found this useful?