Handbook · Vol. IV · 2026 Track II · Caching, layered · piece 2 of 2 Deep dive

Track II · Caching, layered

Advanced caching.

The failure modes the primer left out: thundering herds, stampede protection, request coalescing, jitter, warming, and the four kinds of invalidation that bite in production.

Track II · Caching, layered
From a hashmap to a global edge fabric.
  1. Primer
    Caching strategies
  2. Deep dive
    Advanced caching

There are two hard things in computer science. Cache invalidation is one of them. At scale it becomes a discipline of its own — equal parts protocol design, distributed systems, and operations.

Cache-aside, write-through, and eviction policies are the easy part. The hard parts arrive at scale: invalidation across many nodes, stampedes when a hot key expires, hierarchical caching across edge and origin, and coordination between replicas. Phil Karlton's line about the only two hard things in computer science was not a joke. With millions of cached objects spread across dozens of nodes, keeping stale data from poisoning the application is real engineering. This is the long form.

CACHE TIERS · request descends until it hits browser ~99% hit · 0 ms CDN edge ~95% · 5 ms CDN regional ~70% · 20 ms app cache in-process · 0.1 ms redis ~85% · 0.5 ms database (origin) ~5 ms · the only true source · everything above is a copy that may be wrong invalidation must propagate UP the chain — and racing writes can re-poison upstream from below
Five tiers of cache between the user and the truth. Each tier is faster than the next; each one can be wrong. Invalidation has to walk every layer.

Why cache invalidation is hard

Caching introduces a second source of truth. The database says the price is $29.99; the cache says $24.99 because it was warmed three minutes before the price change. Every user hitting the cache sees the wrong price. The naive fix — "delete the cache entry when the price changes" — works at small scale and falls apart at large scale.

Multiple cache layers
Browser cache, CDN edge (300+ PoPs), CDN regional, application cache, Redis, database query cache. Invalidating one layer doesn't invalidate the others. You have to walk all of them in order.
Race conditions
Service A reads price ($24.99). Service B updates DB to $29.99 and invalidates cache. Service A writes its stale read ($24.99) back to cache. Cache is now permanently stale until the next legitimate update. The fix is versioning on every write.
Partial invalidation
A product page is assembled from 15 cached fragments. When the price changes, you must invalidate only the pricing fragment, not the entire page. Tag-based invalidation systems (Cloudflare Cache Tags, Varnish vhash) exist to solve this.
The lurking long tail
A million browser caches around the world hold your old HTML. You cannot purge them. They expire when their max-age says so, full stop. Plan TTLs accordingly.

Three invalidation strategies — and where each fits

TTL-based

"Trust but expire." Set a TTL; the cache discards the entry when it expires; next read repopulates from origin. Simple, hard to break, eventually consistent. Stale data window = TTL. Default for almost everything.

Explicit invalidation

Writers actively delete or update cache entries on change. Fast convergence (seconds) but ferociously hard to get right at scale — every code path that mutates state must remember to invalidate. Subtle bugs are forever.

Event-driven (CDC)

Database change feed (Debezium, MySQL binlog, Postgres logical replication) drives cache updates. The DB is the source of truth; cache eventually agrees. Cleanest design at scale; most operational complexity to set up.

The cache stampede — and three ways to stop it

A popular cache key expires. A thousand requests arrive in the next millisecond. Each finds the cache empty. Each goes to the database. The database falls over from sudden 1000× load. The cache repopulates eventually, but the database may have died in the meantime. This is the cache stampede (also: dogpile, thundering herd on cache miss).

Per-key locking
The first request to find the cache empty acquires a lock; the others wait. Only one DB query, all callers get the same answer. Locks need to be in Redis, not in-process; needs careful timeout to avoid deadlock.
Probabilistic early expiration (XFetch)
Each read may decide to refresh slightly before the TTL expires, with probability rising as expiry approaches. The expensive recompute happens once, in the background, while users keep getting hits. The Vattani 2015 paper is the formal version.
Stale-while-revalidate
The HTTP/CDN-native answer. Cache returns stale data immediately and fires off a background refresh. Users never see a slow response; the cache eventually agrees with the origin. Works at every layer of HTTP cache.
Soft TTLs
Two TTLs per entry: soft (after which the entry is "stale, please refresh") and hard (after which the entry is gone). Pairs naturally with stale-while-revalidate.

Cache penetration and the negative-cache pattern

Stampede happens when many users want the same hot key. Penetration happens when users (or attackers) request keys that don't exist: /users/9999999, /users/-1, random nonsense. Each request misses the cache (correctly) and hits the database. The DB sees a flood of useless lookups.

Two defences. First, cache the negative answer — store "this key does not exist" in Redis with a short TTL (60-300 seconds). Second, put a Bloom filter in front of the lookup; if the filter says the key isn't there, skip both cache and DB. Bloom filters can fit ten million keys in a few megabytes with a 1% false-positive rate, and false positives only cost you the regular cache+DB lookup you would have done anyway.

Hot-key problems

One key gets 80% of traffic — a celebrity's profile, a viral product. Even a distributed cache becomes a bottleneck because all that traffic hits one Redis shard. Three mitigations:

Local cache in front of distributed cache
Each app instance keeps a tiny in-process LRU for the hottest few hundred keys. Caffeine, Guava, lru-cache. The hot key is served from in-process memory; only the cold long tail goes to Redis.
Key sharding
Append a small random suffix when writing the hot key (profile:42:s0, profile:42:s1...) so it hashes to multiple Redis shards. On read, pick a suffix at random. Multiplies write cost by N but spreads read load.
Consistent hashing with bounded loads
The newer flavour of consistent hashing (Mirrokni, Thorup, Zadimoghaddam, 2017) caps how much load any single shard can take, redirecting overflow to neighbours. Implemented by Vitess, modern CDNs.

Multi-region and cache coherence

Once your cache is replicated across regions, every regional cache might disagree with every other. Strategies:

Independent regional caches
Each region has its own cache, populated from its regional read-replica. Simplest, eventually consistent, occasional stale reads across regions. Acceptable for most use cases.
Pub/sub invalidation fanout
Writes publish "invalidate key X" to a global topic. Each region's cache subscribes and purges. Window of inconsistency = pub/sub propagation latency, typically ~1 second.
Versioned reads
Every read carries the writer's version; the cache checks; if stale, fetches. Strongest consistency, highest cost. Used in Spanner-style systems.

The hard cases

Cache pollution from a bad deploy. A buggy deploy writes wrong values to the cache. Rolling back the code doesn't fix the cache — the bad values are baked in for the TTL. Mitigation: cache keys should include a version namespace (v3:user:42) so a deploy can cut over to a fresh keyspace. Cheap insurance.
The cold start. A region failover, a Redis restart, a cluster resize — the cache is suddenly empty. The next minute of traffic all goes to the database. Mitigation: warm the cache before sending traffic — replay yesterday's hottest queries against the new cache before flipping the LB.
Negative caching of transient errors. If you cache "this key returned an error" without distinguishing transient (5xx) from terminal (404), a single brief outage leaves a poison pill in the cache for the TTL. Mitigation: never negative-cache 5xx; only negative-cache 404 / NotFound.

Practical defaults

  1. TTL by default. Reach for explicit invalidation only when you have a clear data model and a discipline for it.
  2. Use stale-while-revalidate everywhere it's available — HTTP, CloudFront, Varnish, Caffeine. It eliminates a class of latency spikes for free.
  3. Defend against stampede on every hot key — even one in a hundred can take down a database.
  4. Negative-cache misses with a short TTL (60-300s). Add a Bloom filter for known-finite keyspaces.
  5. Version your cache keyspace. Roll forward a version namespace on bad deploys instead of trying to purge millions of keys.
  6. For multi-region, accept eventual consistency at the cache layer unless the business model truly cannot tolerate it.
  7. Monitor: hit ratio, eviction rate, p99 latency, memory pressure. The sound of an unhealthy cache is "hit ratio dropping."
Found this useful?