04 / 19

Playbook / 04

A news feed

Twitter, Threads, Instagram, LinkedIn — the same shape. A user follows other users; a new post fans out to those followers' feeds. The interesting choice is when that fan-out happens. Push at write time, pull at read time, or a hybrid keyed on who the author is. The celebrity problem — one user with 100M followers — collapses any pure approach. This is the design that teaches fan-out trade-offs better than any other in the playbook.

1 · Clarifying questions

User scale?	500M MAU, 100M DAU. Average user follows 200 accounts; long tail follows 5,000.
Post scale?	200M posts/day. Read > 50× write — feed is the dominant load.
Ranking?	Chronological by default; ML-ranked for "For You" tab. Both surfaces matter; we'll design the chronological feed and note where ranking plugs in.
Posts include media?	Yes — text + 0–4 images/videos. Media handled by separate object-storage subsystem; the feed only stores references.
Latency budget?	P99 feed-read ≤ 200 ms. P99 post-create ≤ 500 ms (the fan-out hides behind it).
Freshness target?	New post visible to followers in < 5 s P99 (chronological). Less critical for ML-ranked feeds.
Celebrity threshold?	Anyone with > 1M followers. ~10K accounts. They get pull-on-read; everyone else gets push-on-write.
Multi-region?	Yes. Active-active reads, region-pinned writes for now.

2 · Capacity math, on a napkin

Number	Calculation	Result
Posts / day	given	200M
Post create QPS (avg / peak)	200M / 86,400 × 3	~2.3K / ~7K
Feed reads / day	100M DAU × 30 sessions × 1 page	~3B
Feed read QPS (peak)	3B / 86,400 × 3	~100K
Avg follower count	given	200
Naive fan-out write amplification	200M × 200	40B feed writes / day → ~460K QPS avg
Celebrity write amplification	10K × 1M / 200M	~5% of posts × 5,000× amp = swamps everything
Storage / post	~1 KB metadata + media-pointer	~1 KB
Storage / year (posts)	200M × 1 KB × 365	~73 TB
Feed cache (hot 10% users)	10M users × 200 posts × 100 B	~200 GB

The 40B writes/day is the headline number. Fan-out-on-write is feasible — but only for non-celebrities. The celebrity column is where the design breaks if you're not careful.

3 · API and data model

Endpoints

POST /v1/posts # create a post
{
 "text": "...",
 "media_ids": ["img-abc", "vid-def"] # already uploaded
}
→ 201 {"post_id":"p_aB3x9Q2", "created_at":"2026-05-09T..."}

GET /v1/feed?cursor=<opaque> # read feed (chronological)
→ 200 {
 "posts": [{ post_id, author_id, text, media, created_at, ... }, ...],
 "next_cursor": "<opaque>"
}

POST /v1/follow # follow another user
{"target_user_id": "u_bob"}
→ 204

DELETE /v1/follow/{target_user_id} # unfollow
→ 204

Schemas

posts -- the source of truth, sharded by post_id
 post_id BIGINT (Snowflake) PRIMARY KEY -- time-ordered ID
 author_id BIGINT NOT NULL
 text TEXT
 media JSONB -- references to media-svc
 created_at TIMESTAMP NOT NULL
 reply_to BIGINT NULL -- for threading
 deleted BOOLEAN NOT NULL DEFAULT FALSE
 INDEX (author_id, created_at) -- "user's posts"

follows -- sharded by follower_id
 follower_id BIGINT
 followee_id BIGINT
 created_at TIMESTAMP
 PRIMARY KEY (follower_id, followee_id)
 INDEX (followee_id, follower_id) -- reverse lookup for fan-out

feeds -- one row per user, chronological list
 user_id BIGINT PRIMARY KEY
 posts LIST<post_id> -- last 1,000; older trimmed
 last_synced TIMESTAMP

celebrity_posts -- the pull-side index for big accounts
 author_id BIGINT
 post_id BIGINT
 created_at TIMESTAMP
 PRIMARY KEY ((author_id), created_at DESC, post_id)
 -- per-author, time-ordered

4 · High-level architecture

Two paths through the system. Write path: post svc → posts KV → fan-out queue → workers → write to each follower's feed. Read path: feed svc → feed cache → merge with celebrity-pull results → optional ranker → return.

5 · The hard part — fan-out strategy

Three approaches. Each has a regime where it wins.

Strategy	How it works	Wins when	Loses when
Push (fan-out on write)	On post creation, write the post_id into every follower's feed list.	Few followers per author, read-heavy.	Celebrities — one post = millions of writes. Storage and tail-latency disaster.
Pull (fan-out on read)	On feed read, query "give me posts from people I follow, ordered by time".	Many followers per author, write-heavy.	Read tail latency: scatter-gather across hundreds of accounts. Cold reads are expensive.
Hybrid	Push for non-celebrities; pull for celebrities. Merge at read time.	The real world: skewed follower distributions.	More moving parts. Two code paths to maintain. Worth it.

The hybrid algorithm

On POST(author=A, post=p):
 if A.follower_count > CELEBRITY_THRESHOLD (1M):
 INSERT INTO celebrity_posts (author=A, post=p)
 else:
 publish (A, p) to fan-out queue
 workers consume → for each follower F of A:
 LPUSH feeds(F).posts = p (cap at 1000)

On READ_FEED(user=U):
 base = feeds(U).posts -- already-fan-out'd posts
 cels = []
 for each celebrity C that U follows:
 cels += celebrity_posts(C).slice(since=U.last_seen) -- pull
 merged = sort(base + cels) by created_at DESC
 optionally → ranker.rerank(merged)
 return first N

The threshold isn't a magic number. Celebrities are roughly the top 0.002% of users by follower count. Tune the threshold so the celebrity-pull cost equals the push-on-write cost — typically a few hundred thousand to a few million followers depending on read/write ratios. Track which side dominates and adjust quarterly.

6 · Caching

Post cache. The post body keyed by post_id in Redis. Hit rate > 99% during traffic peaks. ~24-hour TTL with explicit invalidation on edit/delete.
Feed cache. Each user's most-recent 200 posts pre-merged and ranked. Refreshed on read if stale (> 60 s old) or on push from fan-out workers. ~80% of feed reads serve from this cache without any database hit.
Celebrity cache. The last 200 posts per celebrity in a Redis sorted set. Read by every follower's feed-read; high TTL (5 min).
User-graph cache. "Who does U follow" — small, hot, and invalidated only on follow/unfollow. Often pinned in process-local cache for the top 1% of users.

7 · Ranking, dedupe, freshness

Ranking

Chronological is the default. The "For You" surface plugs in a ranker after the merge — typically a two-stage system:

Candidate generation. The merged feed (fan-out + celebrity pull + injected ads + suggested follows) is the candidate pool. ~1,000 candidates.
Scoring. A lightweight ML model scores each candidate (engagement probability, freshness decay, source diversity). Run inline at feed-read time.
Heavy reranking. Top ~200 go through a heavier model (transformer) before being returned.

Dedupe

A retweet of a post the user has already seen is a dedupe target. Track a per-user "seen post" set (Bloom filter, 1% FP rate, 100 KB per user) — cheap and avoids the worst feed-thrash. Wipe weekly.

Freshness

Time-decay in the ranker: score = engagement / (age_minutes + 120)^1.8. This is approximately the Hacker News rank formula. Tune the exponent so the feed isn't dominated by one-hour-old viral posts.

8 · Failure modes & runbook

Failure	Symptom	Mitigation
Fan-out worker pool falls behind	New posts visible to followers minutes late	Auto-scale workers on queue depth; SLO alarm at > 30 s lag.
Celebrity post traffic spike	Celebrity cache shard at 100% CPU	Promote the celebrity to a per-pod local cache; pre-fetch on follow event.
One feed shard down	1/N of users see "feed unavailable"	Replica failover; serve stale feed from previous-cursor cache for 5 min.
ML ranker down	"For You" tab degrades to chronological	Graceful degradation: chronological is always the floor. Alert; non-page severity.
Massive follow burst (a celebrity joined)	Follow-graph DB writes saturated; fan-out queue grows	Rate-limit follow operations; pre-classify the new account as celebrity if follower count crosses threshold mid-flight.
Stale feeds across regions	User in Region B doesn't see post made in Region A for > 30 s	Cross-region replication monitor; run probe writes; alert if cross-region lag > 5 s P99.
Bad post caused viral mass-block	Spike in delete + report; downstream caches stale	Pub/Sub fan-out to all caches on delete. Bloom filter for tombstones at the read path.

9 · Cost & SLOs

Line	Estimate	Note
Posts KV (sharded, 73 TB / yr → 700 TB / 10 yr × 3 replicas)	~$50K / month	Cassandra-class storage; tiered after 90 d
Feed KV (200 GB hot, 2 TB cold)	~$10K / month	Redis cluster + RocksDB tier
Fan-out queue (Kafka, 460K msg/s avg)	~$8K / month	3 brokers per AZ × 3 AZs
Workers (200 pods)	~$5K / month	Auto-scaling on queue depth
Feed-read svc (300 pods)	~$8K / month	Stateless, scales linearly
ML inference (CPU)	~$15K / month	1k QPS lighter ranker; heavier rerank for top engagement
Cross-region transfer	~$10K / month	Async replication
Total	~$106K / month	~$0.001 / DAU-month — order of magnitude only

SLOs

Feed read availability: 99.99%. Multi-region read; fall back to chronological if ranker fails.
Feed read P99: 200 ms. Cache-served target; degraded path is < 500 ms.
Post create availability: 99.95%. Tighter than feed read — losing a post is a product severity.
Post visibility latency: P99 < 5 s from create to visible in followers' feeds (chronological).

10 · Trade-offs & "what would you change at 10×"

If…	Then…
10× users (5B MAU)	Region-pin user data; multi-cluster Kafka; the fan-out worker pool becomes a data-plane in itself.
Pure-ML feed (no chronological)	The fan-out queue becomes a candidate-event firehose. Per-user offline + online candidate generators. Ranker becomes the dominant cost.
Strict cross-region consistency	Different design — accept higher latency, route writes to a leader region. The trade-off is post-create latency for global ordering.
Real-time-ranked feed (no batching)	Streaming ML pipeline with per-user vectors materialised on every write. Order-of-magnitude more inference cost.
Group / community feeds	The fan-out target list grows: "post → followers + group members". Materialise group-level feeds; same hybrid trick for big groups.
"What would a more senior answer add?"	The integrity layer — spam classification, harassment auto-moderation, the regulatory pipeline (DSA in EU, KOSA in US). Most this designs skip this; it's a meaningful fraction of the engineering at any real social platform.

Defending the decisions — interview pushback

The feed question is really six choices in a trench coat. Each draws probing — have a one-sentence answer ready and the rest of the interview goes faster.

"Fan-out on write for the common case — why?" Most users don't have 100k followers. For them, fan-out on write is simpler, faster to read, and the write amplification is manageable.
"Fan-out on read for celebrities — at what threshold?" 100k to 1M followers is where the math breaks. Pick a number, commit to it, say you'd instrument and tune it. A "celebrity merger" pulls recent posts at feed-load and interleaves them with the pre-computed timeline.
"Why cap the pre-computed timeline at ~800 entries?" A Redis sorted set with 800 posts per user is the hot cache. Older posts come from durable storage on demand. LRU eviction handles the rest. Holding everything is unnecessary; holding the hot prefix is enough.
"Why cursor pagination, never offset?" Offset breaks when posts are inserted between page loads — the user sees duplicates or skips. Cursor is stable across concurrent writes because it's a monotonic key, not a position count.
"Why eventually consistent counters?" Like counts are read by everyone, written by anyone, displayed within milliseconds, and absolutely cannot be exact. Redis HINCRBY for in-flight, periodic sync to durable storage. Nobody can tell 14,823 from 14,824.
"Why is ranking separate from feed delivery?" Different SLOs, different failure modes. Feed delivery should not block on an ML model. Ranking is a candidate-set re-ordering step, not part of the retrieval path.

A news feed

1 · Clarifying questions

2 · Capacity math, on a napkin

3 · API and data model

Endpoints

Schemas

4 · High-level architecture

5 · The hard part — fan-out strategy

The hybrid algorithm

6 · Caching

7 · Ranking, dedupe, freshness

Ranking

Dedupe

Freshness

8 · Failure modes & runbook

9 · Cost & SLOs

SLOs

10 · Trade-offs & "what would you change at 10×"

Defending the decisions — interview pushback

Further reading

Chat / DM