11 min read · Guide · Performance
How it works · Performance

Caching, storing an answer so you don't compute it twice.

Phil Karlton, possibly: "There are only two hard things in computer science — cache invalidation and naming things." The naming part is folklore. The invalidation part is fair warning.

Parts01 – 08 Interactive4-tier playable PrereqHTTP / DNS

Why caching exists: closer memory is faster memory

Eight orders of magnitude from register to disk.

Caching stores a copy of expensive-to-compute data closer to where it's needed. Modern web stacks have four distinct caching tiers: browser cache (HTTP cache headers), CDN edge cache (Cloudflare, Fastly, Akamai), application cache (Redis, Memcached, Caffeine), and database cache (buffer pool, query plan cache). Each layer earns its keep at a different cost/distance trade-off.

Reading 1 MB sequentially from RAM takes ~250 microseconds. From a local SSD, ~1 ms. Across a gigabit network, ~10 ms. From a database two AZs away, ~30 ms. From the origin via a cold edge halfway around the world, ~200 ms. Caching is the engineering art of moving frequently-asked questions to the closer, cheaper tier. The same principle plays out inside the CPU at picosecond scale — see the memory hierarchy deep dive for the full eight orders of magnitude from register to tape, and how CPU caches work for the L1/L2/L3 organization that hides DRAM latency.

Every cache trades two costs against each other: memory (for the cached copy) and staleness (the copy might no longer match origin). The whole craft is choosing the right tradeoff for each kind of data.

SPEED

10× per tier.

Each level closer to the user — from origin to app cache to CDN to browser — is roughly an order of magnitude cheaper. A page-load that hits everything at the browser tier and a page-load that walks to origin can differ by 200×.

STALENESS

The cost of speed.

A cached value is correct at write time and approximate forever after. The cache's TTL is your contract with the user about how stale they're allowed to see. Tune that number consciously — the default is rarely right.

SCOPE

Public vs private.

A response with personal data must never end up in a CDN where another user could hit it. The Cache-Control header's public/private split exists for exactly this — and the most catastrophic incidents happen when it's wrong.


The four caching tiers: browser, CDN, app, database

Each has its own keys, scope, and invalidation.

A web request can be answered at any of four cache tiers — and ideally is. Each runs a distinct cache, with distinct keys, scope, and invalidation. The simulator below lets you watch one request fall through them.

  1. browser

    In-process · per user

    The browser keeps a content-addressed disk cache plus a memory cache for the current tab. Static assets with long max-age don't even touch the network on return visits. The fastest tier, and entirely under the user's machine.

  2. CDN

    Edge PoP · per region

    Cloudflare, Fastly, CloudFront, Akamai. Each PoP holds a regional cache; an edge worker may also run logic before forwarding. Cache key includes host, path, query (selectively), and Vary headers. Origin shielding can layer a second level.

  3. app

    Application cache · Redis / Memcached

    In-memory, in-region. Holds the values the app would otherwise compute or query for — typically Redis. Cache-aside (read-through) is the dominant pattern; write-through and write-behind exist for narrower cases.

  4. db

    Database buffer pool · Postgres / Innodb

    Itself a cache — of disk pages, in RAM. Modern databases are fast precisely because the working set lives in the buffer pool. Even your "uncached" queries are mostly served from cache, sitting behind B-tree indexes.


One request, four cache decisions

Flip each tier to hit, miss, revalidate, or bypass.

Below: a single GET. Each tier has one of four states — hit, miss, 304 revalidate, or bypass. Click to flip them and watch the path the request walks, plus the total wall-clock you save when caches do their job.

Total: 60.0 ms Served by: db
in-process Browser cache 1.0 ms
PoP CDN edge 18.0 ms
origin region App cache · Redis 6.0 ms
origin Database 35.0 ms
HIT · Found locally; return immediately.MISS · Not here; consult the next tier.304 · "Probably still good" — ask upstream with If-None-Match; serve local on 304.BYPASS · no-store / private — ignore this tier entirely.

How the app talks to the cache

Cache-aside, write-through, write-behind, write-around.

The application tier has the most strategy choice. Three patterns dominate, each with a different tradeoff between consistency, write latency, and the number of moving parts.

  1. cache-aside

    Read-through, lazy fill

    Read: check cache; on miss, query DB and populate cache; return. Write: update DB; delete the cache entry (don't update — race conditions). Simple, dominant, and the right default unless you have a specific reason.

  2. write-through

    Cache and DB written together

    Write: update both cache and DB synchronously. Read: cache only (no DB on miss path). Strong cache freshness, but a slow DB makes every write slow, and the cache must hold the entire dataset.

  3. write-behind

    Cache first, DB later

    Write: update cache; queue an async write to DB via a message queue. Read: cache. Fastest writes; terrifying durability — a cache crash before the queue drains loses data. Used in very narrow, write-heavy, loss-tolerant workloads (analytics counters).

  4. write-around

    Skip the cache on write

    Write: DB only; cache untouched. Read: cache-aside. Useful when writes are rarely read again soon (logs, audit records) — you don't pollute the hot cache with cold data.


Cache invalidation: three ways to keep entries fresh

TTL, purge on write, or versioned URLs.

A cache entry can be wrong at any moment after it was written. Three mechanisms control how long it stays wrong, in increasing order of effort.

Time-based · TTL

"Wrong, eventually."

Set a max-age. After it expires, the entry is purged or revalidated. Simple, lazy, correct on average — and the only viable choice when you can't notify the cache of changes.

Event-based · purge

"Wrong, immediately."

Writes trigger a purge. Cloudflare's API, Fastly's instant-purge, surrogate-key tagging. Fast, but requires every write path to know about every cache. Forget one and that path serves stale data forever.

Versioned URLs · the third path

The trick that sidesteps both: version the URL. /app.v37.js is immutable forever — cache it for a year. Deploy a new version and the URL changes to /app.v38.js. You never invalidate; you just stop asking. Used by every modern build pipeline.


Cache stampedes, and how to prevent them

Jitter, single-flight, and stale-while-revalidate.

A popular cache key with a hard TTL expires at the same instant for everyone. The next thousand requests all miss simultaneously, hit the same database query, and bring the database down. The cache caused the outage. A circuit breaker in front of the origin keeps the blast radius contained.

Three defences in widespread use. Jitter: add a random offset to the TTL so expirations spread over a few seconds. Single-flight: only one request per key may regenerate at a time; the rest wait for it. Soft TTL / stale-while-revalidate: serve the stale value while one request asynchronously regenerates — readers never see the gap.

The cache-eviction simulator shows how LRU, LFU, ARC, and TinyLFU compare under different access patterns — a layer above stampedes but in the same problem family.


Cache eviction: deciding what to drop when full

LRU, LFU, and the modern TinyLFU compromise.

Caches are bounded. When you add a new entry to a full cache, something must leave. The choice of what defines the policy.

LRU

Least Recently Used

Drop whatever hasn't been accessed in the longest time. The default, the best-understood, and the right answer for typical access distributions. Vulnerable to single sequential scans wiping the working set.

LFU

Least Frequently Used

Drop whatever's been accessed the least. Better than LRU when access frequency varies wildly. Worse when the working set drifts — old popular keys hold space they don't deserve.

TinyLFU

The modern compromise

Window-LRU front + frequency-based admission. Caffeine and Redis use variants. Hit rates within 1–2% of theoretically optimal across most workloads with O(1) operations.


Cache observability: measure, don't assume

Watch hit rate, eviction rate, and key cardinality.

A cache is only as useful as its hit rate. Three numbers should always be on a production cache dashboard:

  1. hit rate

    Hits ÷ (hits + misses)

    Lower than expected = wasted memory or wrong key design. Higher than 99% on a critical path = you can probably shrink the cache without losing performance.

  2. eviction rate

    Items pushed out per second

    A high churn rate means the cache is too small for the working set — every addition costs a removal. Either grow the cache or partition the keyspace so hot and cold lanes don't evict each other.

  3. key cardinality

    How many distinct keys

    An exploding key count is the smell of a key that includes a timestamp, a request ID, or anything else that should be normalized away. Each unique key pollutes the cache once and is never read again.

What real sites run for caching

Three architectures, three tradeoffs.

Twitter — three-tier (browser + CDN + Redis). Twitter's media surface is cached at the browser (HTTP cache headers, ~24h), at Akamai's CDN (~1h), and at internal Redis fleets in front of the Manhattan storage layer. The internal Redis tier hits ~99.5% on average; cache misses fall through to Manhattan with single-digit-ms p99.

Wikipedia — Varnish + Apache traffic stack. Wikipedia's edge runs Varnish caches in front of the Apache backends; ~85% of all page views are served from Varnish without ever reaching the application server. Their published cache hit ratio per language wiki is in the 90%+ range.

GitHub — Memcached for sessions, Redis for queues, Fastly for static. GitHub uses a different cache per workload type: Memcached for HTTP session lookups (low-latency, multi-tenant), Redis for background-job queues and rate limiting (atomic operations), Fastly CDN for repository tarballs and Pages. The architectural lesson: there's no single cache; you pick the right one per access pattern.

The Caffeine library. For Java services, Ben Manes's Caffeine is the de-facto in-process cache. Implements W-TinyLFU (Einziger 2017), beats Guava on hit rate at almost every workload, ships with maximum-size, weight-based, and refresh-after-write policies. Used by Cassandra, Druid, Linkerd, and most modern Spring services.



A closing note

Caching is a stack of identical decisions repeated at each tier: what to cache, with what key, for how long, and how to invalidate. Browser, CDN, app, database — same four questions, scoped differently. Get them right and the system feels instant. Get them wrong and it goes catastrophically out of date in front of strangers. Tune the TTL consciously. Pick the strategy by writes vs reads. Measure the hit rate.

Related Advanced caching Cache hit ratio Cache-aside
Found this useful?