Pastebin
The same shape as URL shortener — small write API, hot read path — but the storage choice flips. Pastes are large blobs (10 KB to 2 MB), so the architecture moves from hot KV to object storage with edge caching. Walk this and you've seen both ends of the read-heavy spectrum, plus the retention and abuse-handling problems that come with user-generated content.
1 · Clarifying questions
| What's the max paste size? | 2 MB. Hard cap for free tier; 10 MB for paid. The cap shapes everything from the schema to the upload path. |
| How long do pastes live? | Configurable per paste — 10 min, 1 hour, 1 day, 1 month, 1 year, forever. Forever costs real money in long-tail storage. |
| Privacy model? | Public (indexed, listable), unlisted (link-only, not listable), private (auth required). Three different access patterns, three different code paths. |
| What's the read/write ratio? | ~10:1 across all pastes; ~100:1 for the long tail of viral pastes. Most pastes get one or two reads; a few get millions. |
| Syntax highlighting? | Client-side only (Prism / highlight.js). Never store rendered HTML — that doubles storage and bakes in security risks. |
| Abuse handling? | Required. Malware scanning, CSAM detection, takedown workflow, regulator notice. Skipping this is what fails the interview round. |
| Latency budget? | P99 read ≤ 200 ms (most served from edge). P99 create ≤ 500 ms (it's an upload). |
| Multi-region? | Yes for reads (anycast edge). Single-region writes initially. |
2 · Capacity math, on a napkin
| Number | Calculation | Result |
|---|---|---|
| Writes / day | given | 10M |
| Reads / day | 10× writes | 100M |
| Write QPS (avg / peak) | 10M / 86,400 × 3 | ~120 / ~350 |
| Read QPS (avg / peak) | 100M / 86,400 × 3 | ~1.2K / ~3.5K (origin) |
| Average paste size | given | ~50 KB |
| Storage / day | 10M × 50 KB | ~500 GB |
| Storage / year | ×365 | ~180 TB |
| Storage / 10 yr (no expiry) | ×10 | ~1.8 PB |
| Storage / 10 yr (50% expire) | halved | ~900 TB |
| Hot working set | ~5% of pastes account for 90% of reads | ~9 TB |
| Egress / day | 100M × 50 KB × 0.2 (CDN miss) | ~1 TB / day from origin |
The takeaway: this is a storage problem, not a compute problem. Object storage (S3) at $0.023/GB-month gets us 1.8 PB for ~$40K/month — and most of that flows out via the CDN, which is the next biggest cost line.
3 · API and data model
Endpoints
POST /v1/pastes # create
{
"content": "...", # required, ≤ 2 MB
"language": "python", # optional, hint for client-side highlighter
"privacy": "unlisted", # public | unlisted | private
"expires": "1d" # 10m | 1h | 1d | 1m | 1y | never
}
→ 201 {"id":"aB3x9Q2", "url":"https://pb.sh/aB3x9Q2", "expires_at":"2026-05-10T..."}
GET /:id # read (the hot path)
→ 200 text/plain + the paste content
Cache-Control: public, max-age=300, s-maxage=86400
GET /:id/raw # read raw, for clients/tools
→ 200 text/plain
POST /:id/report # abuse / takedown
→ 202
DELETE /:id # owner only
→ 204Schema
Two tables, plus the blob in object storage. The Postgres row is intentionally tiny — the paste body lives in S3 with the row pointing at the object key.
pastes -- metadata only, ~300 B / row
id VARCHAR(8) PRIMARY KEY -- base62 ID
blob_key VARCHAR(64) NOT NULL -- s3://pastes/<sha256-prefix>/<sha256>
size_bytes INTEGER NOT NULL
language VARCHAR(32) NULL
privacy VARCHAR(16) NOT NULL -- public | unlisted | private
owner_id BIGINT NULL
content_hash CHAR(64) NOT NULL -- sha256, used for dedupe
created_at TIMESTAMP NOT NULL
expires_at TIMESTAMP NULL
deleted_at TIMESTAMP NULL -- soft delete; reaper hard-deletes after grace
flagged BOOLEAN NOT NULL DEFAULT FALSE -- abuse review
INDEX (expires_at) WHERE expires_at IS NOT NULL -- expiry sweep
INDEX (content_hash) -- dedupe lookup
INDEX (owner_id, created_at) -- "my pastes"
INDEX (created_at) WHERE privacy = 'public' -- public listing
paste_blobs -- S3 bucket, content-addressed
Bucket layout: pastes/<first-2-hex>/<sha256>
Storage class: Standard for hot, Infrequent Access after 30d, Glacier after 1y
Server-side encryption: SSE-S3 (free) or SSE-KMS (compliance)
Lifecycle policy: tier-down on age, hard-delete on expiryContent-addressed storage by SHA-256 means duplicate pastes — and there are many — share a blob. The pastes table holds N rows pointing to one S3 object. ~30% storage savings in practice.
4 · High-level architecture
The shape is read-side-heavy. The edge does most of the work; the origin handles creates and the long tail of cache misses.
The hot path on read is: edge → read svc → Redis (for the metadata) → S3 (for the blob). On a CDN hit, none of this fires. The create path is: create svc → scan svc (async, can fail-open or fail-closed depending on policy) → S3 + Postgres.
5 · The hard part — storage tiering and the retention sweep
At 1.8 PB and growing, storage is the dominant cost. Three patterns earn their keep:
| Pattern | How | Savings |
|---|---|---|
| Lifecycle tiering | S3 lifecycle policy: Standard for 30 d → Infrequent Access for 1 y → Glacier Deep Archive after. | ~75% on the long tail. Glacier Deep Archive is $0.00099/GB-month vs $0.023 for Standard. |
| Content-addressed dedupe | Object key is SHA-256 of content. New paste with same content reuses the existing blob. | ~30% based on real-world dupes (cron snippets, error stacks, the same Stack Overflow answer pasted 8 times). |
| Compression at write | Server-side Zstd before S3 PUT. Decompress on read; CDN caches the compressed bytes. | ~60% on text. CPU cost is small at 120 RPS write. |
The expiry sweep
Pastes with expires_at need to disappear on time. Three ways to do it,
from worst to best:
- Scan the whole table at midnight. Hot, single-threaded, blocks the database. Don't.
- Index on
expires_at, sweep every 15 minutes. Pull rows withexpires_at < NOW(), delete from S3, soft-delete the row. The standard answer. - Time-bucketed expiry queue. Route each paste to a Redis sorted set keyed by hour. The sweeper just pops the expired bucket. No range scans on the relational store. The next-tier answer when expiry is on the hot path.
6 · Caching strategy
Three layers, the same as URL shortener but with a different ratio of work — the edge does much more here because the payload is bigger.
- Edge cache (CDN). 24-hour TTL on read responses. Catches ~85% of all reads. The viral-paste burst (someone tweets a paste link) lands here, not on origin.
- Redis (metadata only). 5-minute TTL on the
id → blob_key, expiry, privacytuple. ~99% hit rate; saves a Postgres lookup on every cache miss at the edge. - S3 acts as its own cache. The
If-None-Match/If-Modified-Sincedance lets the read-svc revalidate cheaply. The blob bytes don't go through the read-svc — it returns a signed URL or a 302, depending on privacy.
Cache-Control: private, max-age=300 at the
user agent only. Private — never cached at the edge; no-store and an
auth-checked stream from origin. Mixing these up is the most common security bug in
designs like this.7 · Abuse, scanning, takedowns
User-generated content invariably attracts abuse. The this design treats this as first-class infrastructure rather than an afterthought.
| Concern | Tooling | Where it runs |
|---|---|---|
| Malware | ClamAV, VirusTotal API, Cloudflare Workers | Async after PUT, before publishing to CDN |
| CSAM / known-bad hash | PhotoDNA, NCMEC hash lists | Sync on create — fail-closed; legal requirement |
| Spam / phishing | URL classifier, domain reputation, ML model | Async, with confidence-thresholded auto-takedown |
| Copyright / DMCA | Manual review queue, takedown API | Human-in-loop, 24-hour SLA |
| Rate limit | Token bucket per IP + per user | API gateway, before reaching create svc |
A flagged paste is soft-deleted (paste returns 451 Unavailable for Legal Reasons), the blob is moved to a quarantine bucket with restricted IAM, and the event lands in a SIEM. Hard delete only after the appeal window (typically 30 days).
8 · Failure modes & runbook
| Failure | Symptom | Mitigation |
|---|---|---|
| S3 region outage | All reads on cache miss → 5xx | Cross-region replication; failover URL signing to the replica region (~5 min). |
| Postgres primary down | Create svc fails; reads survive (Redis + S3) | Read replica auto-promote (~30 s). Create svc returns 503 with Retry-After. |
| Redis cluster unhealthy | Postgres load 10× | Local in-process LRU absorbs ~50%; Postgres has read-replicas to soak the rest; circuit breaker drops to direct-DB read. |
| Scan svc down | Pastes pile up in pending state | Backlog the queue; the create svc returns 202 Accepted with the paste in "scanning" status. Fail-open after 1-hour timeout for low-risk content; fail-closed for known-bad indicators. |
| Viral paste DDoS | Edge is OK; the paste id resolves to a saturating origin | Edge cache absorbs > 95%. Above that, rate-limit per source IP and serve cached value with stale-if-error. |
| Expiry sweep fell behind | Expired pastes accessible past their TTL | The CDN's TTL bounds the leak (max 24 h). Sweep dual-pass: hourly in-region + daily cross-region reconciliation. |
| Storage cost runaway | S3 bill 2× last month | Lifecycle policy alarms on missing transitions; weekly aged-paste audit; automatic Glacier transition for > 1-year unlisted. |
9 · Cost & SLOs
| Line | Estimate | Note |
|---|---|---|
| S3 storage (1.8 PB blended tiers) | ~$18K / month | 40% Standard / 40% IA / 20% Glacier |
| S3 requests + transfer | ~$2K / month | Mostly origin egress to CDN |
| CDN egress | ~$8K / month | ~150 TB / month at $0.005-$0.02 / GB blended |
| Postgres (managed, 2 TB) | ~$1.5K / month | 1 primary + 2 replicas |
| Redis (50 GB cluster) | ~$1K / month | 3-node managed |
| Compute (read 100 + create 30 + scan 20 pods) | ~$3K / month | Managed K8s |
| Scanning (PhotoDNA + ClamAV + VT API) | ~$2K / month | Per-scan cost is low at 120 RPS write |
| Total | ~$36K / month | ~$0.36 / 1K pastes lifetime |
SLOs
- Read availability: 99.99%. Edge + cross-region replication → 12 min/quarter budget.
- Create availability: 99.9%. Tighter budget; ~2 hours/quarter. Postgres failover dominates.
- Read P99: 200 ms (cache hit) / 500 ms (origin). Track separately; the cache miss rate is the main lever.
- Expiry SLA: 99% within 30 minutes of
expires_at. Sweep cadence + CDN TTL bounds.
10 · Trade-offs & "what would you change at 10×"
| If… | Then… |
|---|---|
| Writes 10× (100M / day) | Stronger compression (Zstd → Zstd-19); shard Postgres metadata; pre-compute base62 IDs in batch. |
| Reads 10× (1B / day) | Already mostly absorbed by edge; the lever is CDN coverage and longer TTLs for public pastes (24h → 7d). |
| Strict end-to-end encryption | Client-side encrypt, server stores ciphertext blob + non-secret nonce. Loses dedupe, loses scanning — make this a paid-tier opt-in only. |
| Global writes (multi-region active-active) | Region-prefixed IDs (us-aB3x9, eu-...) avoid a coordination dance. Conflicts impossible because IDs are unique. |
| Versioned pastes | S3 versioning is free; index versions in Postgres with a version column. List endpoint surfaces history. |
| "What would a more senior answer add?" | The legal/compliance pipeline — DMCA queue at scale, regulatory reporting (NCMEC, GDPR Article 17 deletes), the audit trail. Most most candidates skip this; it's the difference between "designs systems" and "owns systems". |
Further reading
- AWS — "Amazon S3 storage classes & lifecycle". The reference doc on tiered storage costs. Memorise the four standard tiers.
- Cloudflare — "Workers, R2 and the egress-cost model". Useful counter-design — R2 has no egress fees, which changes the optimal CDN architecture for pastebin-shaped workloads.
- Microsoft — "PhotoDNA Cloud Service". The standard CSAM-detection plumbing for any user-generated-content service.
- Backblaze — "How long do hard drives last". Tangential, but useful to internalise that "object storage is cheap" is not magic — it's deduplication, erasure coding, and the bathtub curve.
- Adjacent: URL shortener. Same shape, different storage choice. Read both back-to-back.
- Adjacent: CDNs. The edge layer that makes pastebin economically possible.
- Adjacent: Object storage. The S3-shaped substrate underneath.