A news feed
Twitter, Threads, Instagram, LinkedIn — the same shape. A user follows other users; a new post fans out to those followers' feeds. The interesting choice is when that fan-out happens. Push at write time, pull at read time, or a hybrid keyed on who the author is. The celebrity problem — one user with 100M followers — collapses any pure approach. This is the design that teaches fan-out trade-offs better than any other in the playbook.
1 · Clarifying questions
| User scale? | 500M MAU, 100M DAU. Average user follows 200 accounts; long tail follows 5,000. |
| Post scale? | 200M posts/day. Read > 50× write — feed is the dominant load. |
| Ranking? | Chronological by default; ML-ranked for "For You" tab. Both surfaces matter; we'll design the chronological feed and note where ranking plugs in. |
| Posts include media? | Yes — text + 0–4 images/videos. Media handled by separate object-storage subsystem; the feed only stores references. |
| Latency budget? | P99 feed-read ≤ 200 ms. P99 post-create ≤ 500 ms (the fan-out hides behind it). |
| Freshness target? | New post visible to followers in < 5 s P99 (chronological). Less critical for ML-ranked feeds. |
| Celebrity threshold? | Anyone with > 1M followers. ~10K accounts. They get pull-on-read; everyone else gets push-on-write. |
| Multi-region? | Yes. Active-active reads, region-pinned writes for now. |
2 · Capacity math, on a napkin
| Number | Calculation | Result |
|---|---|---|
| Posts / day | given | 200M |
| Post create QPS (avg / peak) | 200M / 86,400 × 3 | ~2.3K / ~7K |
| Feed reads / day | 100M DAU × 30 sessions × 1 page | ~3B |
| Feed read QPS (peak) | 3B / 86,400 × 3 | ~100K |
| Avg follower count | given | 200 |
| Naive fan-out write amplification | 200M × 200 | 40B feed writes / day → ~460K QPS avg |
| Celebrity write amplification | 10K × 1M / 200M | ~5% of posts × 5,000× amp = swamps everything |
| Storage / post | ~1 KB metadata + media-pointer | ~1 KB |
| Storage / year (posts) | 200M × 1 KB × 365 | ~73 TB |
| Feed cache (hot 10% users) | 10M users × 200 posts × 100 B | ~200 GB |
The 40B writes/day is the headline number. Fan-out-on-write is feasible — but only for non-celebrities. The celebrity column is where the design breaks if you're not careful.
3 · API and data model
Endpoints
POST /v1/posts # create a post
{
"text": "...",
"media_ids": ["img-abc", "vid-def"] # already uploaded
}
→ 201 {"post_id":"p_aB3x9Q2", "created_at":"2026-05-09T..."}
GET /v1/feed?cursor=<opaque> # read feed (chronological)
→ 200 {
"posts": [{ post_id, author_id, text, media, created_at, ... }, ...],
"next_cursor": "<opaque>"
}
POST /v1/follow # follow another user
{"target_user_id": "u_bob"}
→ 204
DELETE /v1/follow/{target_user_id} # unfollow
→ 204Schemas
posts -- the source of truth, sharded by post_id
post_id BIGINT (Snowflake) PRIMARY KEY -- time-ordered ID
author_id BIGINT NOT NULL
text TEXT
media JSONB -- references to media-svc
created_at TIMESTAMP NOT NULL
reply_to BIGINT NULL -- for threading
deleted BOOLEAN NOT NULL DEFAULT FALSE
INDEX (author_id, created_at) -- "user's posts"
follows -- sharded by follower_id
follower_id BIGINT
followee_id BIGINT
created_at TIMESTAMP
PRIMARY KEY (follower_id, followee_id)
INDEX (followee_id, follower_id) -- reverse lookup for fan-out
feeds -- one row per user, chronological list
user_id BIGINT PRIMARY KEY
posts LIST<post_id> -- last 1,000; older trimmed
last_synced TIMESTAMP
celebrity_posts -- the pull-side index for big accounts
author_id BIGINT
post_id BIGINT
created_at TIMESTAMP
PRIMARY KEY ((author_id), created_at DESC, post_id)
-- per-author, time-ordered4 · High-level architecture
Two paths through the system. Write path: post svc → posts KV → fan-out queue → workers → write to each follower's feed. Read path: feed svc → feed cache → merge with celebrity-pull results → optional ranker → return.
5 · The hard part — fan-out strategy
Three approaches. Each has a regime where it wins.
| Strategy | How it works | Wins when | Loses when |
|---|---|---|---|
| Push (fan-out on write) | On post creation, write the post_id into every follower's feed list. | Few followers per author, read-heavy. | Celebrities — one post = millions of writes. Storage and tail-latency disaster. |
| Pull (fan-out on read) | On feed read, query "give me posts from people I follow, ordered by time". | Many followers per author, write-heavy. | Read tail latency: scatter-gather across hundreds of accounts. Cold reads are expensive. |
| Hybrid | Push for non-celebrities; pull for celebrities. Merge at read time. | The real world: skewed follower distributions. | More moving parts. Two code paths to maintain. Worth it. |
The hybrid algorithm
On POST(author=A, post=p):
if A.follower_count > CELEBRITY_THRESHOLD (1M):
INSERT INTO celebrity_posts (author=A, post=p)
else:
publish (A, p) to fan-out queue
workers consume → for each follower F of A:
LPUSH feeds(F).posts = p (cap at 1000)
On READ_FEED(user=U):
base = feeds(U).posts -- already-fan-out'd posts
cels = []
for each celebrity C that U follows:
cels += celebrity_posts(C).slice(since=U.last_seen) -- pull
merged = sort(base + cels) by created_at DESC
optionally → ranker.rerank(merged)
return first N6 · Caching
- Post cache. The post body keyed by
post_idin Redis. Hit rate > 99% during traffic peaks. ~24-hour TTL with explicit invalidation on edit/delete. - Feed cache. Each user's most-recent 200 posts pre-merged and ranked. Refreshed on read if stale (> 60 s old) or on push from fan-out workers. ~80% of feed reads serve from this cache without any database hit.
- Celebrity cache. The last 200 posts per celebrity in a Redis sorted set. Read by every follower's feed-read; high TTL (5 min).
- User-graph cache. "Who does U follow" — small, hot, and invalidated only on follow/unfollow. Often pinned in process-local cache for the top 1% of users.
7 · Ranking, dedupe, freshness
Ranking
Chronological is the default. The "For You" surface plugs in a ranker after the merge — typically a two-stage system:
- Candidate generation. The merged feed (fan-out + celebrity pull + injected ads + suggested follows) is the candidate pool. ~1,000 candidates.
- Scoring. A lightweight ML model scores each candidate (engagement probability, freshness decay, source diversity). Run inline at feed-read time.
- Heavy reranking. Top ~200 go through a heavier model (transformer) before being returned.
Dedupe
A retweet of a post the user has already seen is a dedupe target. Track a per-user "seen post" set (Bloom filter, 1% FP rate, 100 KB per user) — cheap and avoids the worst feed-thrash. Wipe weekly.
Freshness
Time-decay in the ranker: score = engagement / (age_minutes + 120)1.8.
This is approximately the Hacker News rank formula. Tune the exponent so the feed
isn't dominated by one-hour-old viral posts.
8 · Failure modes & runbook
| Failure | Symptom | Mitigation |
|---|---|---|
| Fan-out worker pool falls behind | New posts visible to followers minutes late | Auto-scale workers on queue depth; SLO alarm at > 30 s lag. |
| Celebrity post traffic spike | Celebrity cache shard at 100% CPU | Promote the celebrity to a per-pod local cache; pre-fetch on follow event. |
| One feed shard down | 1/N of users see "feed unavailable" | Replica failover; serve stale feed from previous-cursor cache for 5 min. |
| ML ranker down | "For You" tab degrades to chronological | Graceful degradation: chronological is always the floor. Alert; non-page severity. |
| Massive follow burst (a celebrity joined) | Follow-graph DB writes saturated; fan-out queue grows | Rate-limit follow operations; pre-classify the new account as celebrity if follower count crosses threshold mid-flight. |
| Stale feeds across regions | User in Region B doesn't see post made in Region A for > 30 s | Cross-region replication monitor; run probe writes; alert if cross-region lag > 5 s P99. |
| Bad post caused viral mass-block | Spike in delete + report; downstream caches stale | Pub/Sub fan-out to all caches on delete. Bloom filter for tombstones at the read path. |
9 · Cost & SLOs
| Line | Estimate | Note |
|---|---|---|
| Posts KV (sharded, 73 TB / yr → 700 TB / 10 yr × 3 replicas) | ~$50K / month | Cassandra-class storage; tiered after 90 d |
| Feed KV (200 GB hot, 2 TB cold) | ~$10K / month | Redis cluster + RocksDB tier |
| Fan-out queue (Kafka, 460K msg/s avg) | ~$8K / month | 3 brokers per AZ × 3 AZs |
| Workers (200 pods) | ~$5K / month | Auto-scaling on queue depth |
| Feed-read svc (300 pods) | ~$8K / month | Stateless, scales linearly |
| ML inference (CPU) | ~$15K / month | 1k QPS lighter ranker; heavier rerank for top engagement |
| Cross-region transfer | ~$10K / month | Async replication |
| Total | ~$106K / month | ~$0.001 / DAU-month — order of magnitude only |
SLOs
- Feed read availability: 99.99%. Multi-region read; fall back to chronological if ranker fails.
- Feed read P99: 200 ms. Cache-served target; degraded path is < 500 ms.
- Post create availability: 99.95%. Tighter than feed read — losing a post is a product severity.
- Post visibility latency: P99 < 5 s from create to visible in followers' feeds (chronological).
10 · Trade-offs & "what would you change at 10×"
| If… | Then… |
|---|---|
| 10× users (5B MAU) | Region-pin user data; multi-cluster Kafka; the fan-out worker pool becomes a data-plane in itself. |
| Pure-ML feed (no chronological) | The fan-out queue becomes a candidate-event firehose. Per-user offline + online candidate generators. Ranker becomes the dominant cost. |
| Strict cross-region consistency | Different design — accept higher latency, route writes to a leader region. The trade-off is post-create latency for global ordering. |
| Real-time-ranked feed (no batching) | Streaming ML pipeline with per-user vectors materialised on every write. Order-of-magnitude more inference cost. |
| Group / community feeds | The fan-out target list grows: "post → followers + group members". Materialise group-level feeds; same hybrid trick for big groups. |
| "What would a more senior answer add?" | The integrity layer — spam classification, harassment auto-moderation, the regulatory pipeline (DSA in EU, KOSA in US). Most this designs skip this; it's a meaningful fraction of the engineering at any real social platform. |
Defending the decisions — interview pushback
The feed question is really six choices in a trench coat. Each draws probing — have a one-sentence answer ready and the rest of the interview goes faster.
- "Fan-out on write for the common case — why?" Most users don't have 100k followers. For them, fan-out on write is simpler, faster to read, and the write amplification is manageable.
- "Fan-out on read for celebrities — at what threshold?" 100k to 1M followers is where the math breaks. Pick a number, commit to it, say you'd instrument and tune it. A "celebrity merger" pulls recent posts at feed-load and interleaves them with the pre-computed timeline.
- "Why cap the pre-computed timeline at ~800 entries?" A Redis sorted set with 800 posts per user is the hot cache. Older posts come from durable storage on demand. LRU eviction handles the rest. Holding everything is unnecessary; holding the hot prefix is enough.
- "Why cursor pagination, never offset?" Offset breaks when posts are inserted between page loads — the user sees duplicates or skips. Cursor is stable across concurrent writes because it's a monotonic key, not a position count.
- "Why eventually consistent counters?" Like counts are read by everyone, written by anyone, displayed within milliseconds, and absolutely cannot be exact. Redis HINCRBY for in-flight, periodic sync to durable storage. Nobody can tell 14,823 from 14,824.
- "Why is ranking separate from feed delivery?" Different SLOs, different failure modes. Feed delivery should not block on an ML model. Ranking is a candidate-set re-ordering step, not part of the retrieval path.
Further reading
- Krikorian (Twitter, 2013) — "Timelines at Scale". The talk that introduced "fan-out on write with celebrity pull" to the wider industry. 30 minutes; required watching.
- Pinterest Engineering — "Smart Feed: building scalable timelines". A cleaner public writeup of the merge-then-rank pipeline.
- LinkedIn Engineering — "FollowFeed: LinkedIn's Feed Made Faster and Smarter". The infra view from a network with very different follow-distribution dynamics.
- Meta Engineering — "Scaling the Pre-Computed Photo Feed Service". The Instagram side. Different content type, same pattern.
- Stack — "How We Designed Our Feed and Reached 100M Users". Useful as a counterpoint at smaller scale.
- Adjacent: URL shortener. Same hot-KV read pattern at smaller payload.
- Adjacent: Replication. The cross-region story.
- Adjacent: Distributed IDs. Snowflake gives time-ordered post IDs that simplify feed merging.