Caching strategies — System Design Handbook

A well-placed cache turns a 50ms disk-bound operation into a 0.1ms memory lookup. At scale it is not an optimisation — it is the line between systems that work and systems that fall over.

Caching is the highest-leverage way to cut latency and backend load. The mechanics are simple; the discipline is in knowing which layer to cache at, which pattern to use, what to evict, and how to read the hit ratio when something starts going wrong. Five parts: why cache, where to cache, how to keep it consistent, what to evict, and how to reason about hit ratio. By the end you can defend any caching decision in a review and tune a real Redis or Memcached deployment without surprise.

Each step down the hierarchy is roughly an order of magnitude slower. Caching's job is to stop requests from descending past Redis.

Why cache — the three reasons

Software caching applies the same idea CPU designers settled on decades ago: keep hot data in a faster tier than the source of truth. Every modern CPU has L1, L2, and L3 caches because main memory is too slow; every modern web service has Redis or Memcached because the database is too slow. The motivations divide cleanly into three.

Latency: A Redis lookup (~500 µs) versus a Postgres query (~5 ms) is a 10× difference. A page that issues 20 dependent queries goes from 100 ms to 10 ms — the threshold the eye perceives as "instant".
Throughput: Serving 90% of requests from cache leaves the database with 10% of the original load. Redis handles 100,000+ ops/sec on a single instance; Postgres struggles past 5,000 simple SELECTs/sec on the same hardware.
Cost: A 16 GB Redis instance on AWS ElastiCache is roughly $90/month. The equivalent throughput from RDS would be 5–10× the price. Caching is the most cost-effective performance win available.

The hit / miss cycle

The effectiveness of a cache is summarised by one number — the hit ratio — and decomposed into three failure modes.

Cache-aside flow. The hit path is fast; the miss path costs a database round-trip plus a cache write.

Outcome	What happened	What to do
Hit	The key existed and had not expired.	Nothing — this is the goal.
Cold miss	Cache was empty (just started, or the key has never been requested).	Acceptable — every system pays this cost on warmup.
Capacity miss	The cache evicted this key earlier because memory ran out.	Either grow the cache or change the eviction policy.
TTL miss	The key expired before the next request.	Increase TTL, or refresh proactively if the key is hot.

A 95% hit ratio is the realistic target for read-heavy workloads (catalogs, profiles, configuration). Highly personalised data (per-user dashboards, real-time analytics) settles closer to 60–80%. Measuring hit ratio is non-negotiable. Redis exposes INFO stats with keyspace_hits and keyspace_misses; if your hit ratio drops below 80%, you have a key-design problem, your working set exceeds memory, or your TTLs are too aggressive — investigate before adding capacity.

Rule of thumbThe first ten thousand requests of a new feature are always cold. Run a warm-up job after deploys for hot paths; otherwise users see a latency spike that isn't a real regression.

Where to cache — six layers

"The cache" is plural. A modern stack caches at six distinct layers, each with its own invalidation rules and failure modes.

1. Browser

HTTP Cache-Control, ETag, service workers. Free; survives reloads; per-user.

2. CDN edge

Cloudflare, Fastly, CloudFront. Sub-50ms globally, shared across users, immutable assets.

3. Reverse proxy

Nginx, Varnish. Caches origin responses on your own infrastructure. Cheaper than CDN per byte.

4. Application memory

In-process LRU (Node, Java, Go). Fastest possible — no network — but per-instance and dies on restart.

5. Distributed cache

Redis, Memcached. Shared across instances; survives app restarts; the workhorse of most stacks.

6. Database buffer pool

Postgres shared_buffers, MySQL InnoDB pool. Implicit — caches recently read pages — tune for your working set.

Every layer above the database absorbs traffic before it arrives. The classic mistake is to add Redis without first checking whether HTTP cache headers and a CDN would have done the job for free. CDN guide covers layers 1–3 in depth; the rest of this page focuses on layers 4 and 5.

Update strategies — the four patterns

When the data behind your cache changes, what do you do? Four patterns dominate, and choosing among them is the call that defines how your cache behaves.

1. Cache-aside (lazy loading)

The application reads from the cache; on miss, queries the database, populates the cache, and returns. On write, invalidates (deletes) the cache entry — the next read will refetch.

function getUser(id) {
  let user = cache.get(`user:${id}`);
  if (user) return user;          // hit
  user = db.query(...);            // miss → fetch
  cache.setex(`user:${id}`, 300, user);
  return user;
}
function updateUser(id, data) {
  db.update(...);
  cache.del(`user:${id}`);          // invalidate
}

Simple, memory-efficient (only requested keys are cached), and graceful under cache failure (just slower). Used by the overwhelming majority of web apps. The trade-off: a brief staleness window between the database commit and the cache delete; a stale read in that window is possible. For most user-facing data this is acceptable. For payments and account balances it is not.

2. Read-through

The application reads only from the cache; the cache itself fetches from the database on miss. From the application's perspective, the cache is the data store. Found in object-relational mappers (Hibernate's second-level cache) and in some managed services (DynamoDB Accelerator).

3. Write-through

Writes go to the cache and the database synchronously. Strong consistency — the cache is never stale. Higher write latency (two stores per write). The pattern of choice when reads vastly outnumber writes and stale data is unacceptable.

4. Write-behind (write-back)

Writes go to the cache only; a background process flushes batches to the database. Lowest write latency, highest throughput. Risky — a cache crash before flush loses data. Used in metrics pipelines, click counters, and low-stakes high-volume writes; rarely the right answer for transactional state.

Pattern	Read latency	Write latency	Consistency	Failure mode
Cache-aside	fast on hit, slow on miss	fast (write DB, delete cache)	brief staleness window	cache failure → graceful degradation
Read-through	same as cache-aside	fast (write DB)	same as cache-aside	cache layer must be available
Write-through	fast on hit	slow (write both)	strong	cache failure stalls writes
Write-behind	fast on hit	fastest	weak — flush window is data loss zone	cache crash loses unflushed writes

The double-write trapIf you update the database first and the cache second, a process crash between them leaves cache stale. If you update the cache first and the database second, a database failure leaves the cache holding values that were never persisted. Either order is wrong. The correct pattern is invalidate-after-write: write the database, then delete the cache. The next read repopulates from the new database state.

Eviction — when the cache is full

Memory is finite. When the cache fills, it must drop something. The eviction policy decides which key dies; the choice changes the hit ratio dramatically.

LRU: Least-recently-used. Drops the key untouched longest. The default in Redis (allkeys-lru), Memcached, and most application caches. Solid for general workloads.
LFU: Least-frequently-used. Drops the key with the lowest hit count. Better than LRU when access patterns are bursty (some keys are very hot, others are touched once and forgotten).
FIFO: First-in-first-out. Drops the oldest entry. Rarely the best choice — ignores access frequency entirely.
Random: Drops a random key. Surprisingly competitive on uniform workloads and trivially cheap to implement.
ARC: Adaptive Replacement Cache (IBM, 2003). Tracks both recency and frequency; rebalances dynamically. Used in ZFS and PostgreSQL's buffer manager.
W-TinyLFU: Window TinyLFU (2017). Sliding-window approximate LFU, beats LRU and ARC on most modern workloads. Default in Caffeine (Java) and integrated into many production caches.

The Cache Eviction simulator compares these policies against the same access trace — drag the workload toward "Zipfian skew" (a few hot keys, many cold) and watch W-TinyLFU pull ahead.

Cache key design

Keys are an under-appreciated decision. A bad key scheme produces low hit ratios that no amount of capacity will fix.

Include version. When schema changes, bump the version prefix: user:v2:42. Old entries remain in the cache until evicted; new entries use the new schema. No flush required.
Normalise capitalisation. If your application sometimes queries email:Ada@x.com and sometimes email:ada@x.com, the cache treats them as different keys. Lowercase before lookup.
Avoid query-string keys. A cache key based on a sorted query string is fragile — parameter order, encoding, and case all break the hash. Build the key from the parsed values, not the URL.
Tag for bulk invalidation. Group related keys with a tag (posts:user:42) so invalidating a user's posts is one wildcard delete instead of N specific keys.

The hard cases

Three failure modes deserve their own names because they keep showing up.

Thundering herdA popular cached page expires. Every request that arrives in the same millisecond misses, hammers the database, and writes the same value back. The fix is a lock around the regeneration — or probabilistic early expiration so requests don't synchronise on the same TTL boundary.

Cache stampedeThe cache itself dies (a Redis instance fails, a network partition isolates the cluster). Every request misses simultaneously and the database collapses. Mitigation: multi-tier caching (in-process LRU as a backstop), circuit breakers in the cache client, request coalescing.

Cache penetrationAn attacker requests keys that don't exist (user:99999999). Each miss hits the database and finds nothing — and the cache, by default, doesn't store the negative result. Subsequent identical requests miss again. Fix: cache the "not found" result with a short TTL; or use a Bloom filter to short-circuit known-absent keys before they reach the cache.

Practical defaults

If you are starting from a blank deployment and need to ship today:

Use cache-aside as the default pattern. Reach for write-through only when staleness is unacceptable.
Use Redis for the distributed layer; Memcached only if you need raw throughput on a single use case and don't need its data structures.
Set per-key TTLs rather than relying entirely on eviction. 5–60 minutes is the typical range; data with strong staleness requirements goes lower.
Use LRU as the eviction policy unless you have measured reason to switch.
Monitor hit ratio alongside latency. A latency improvement that drops hit ratio is suspicious.
Add negative caching for high-cardinality lookups (user-by-email) where misses are expected to be common.

And keep the working tools nearby: Cache Eviction for policy comparison, LRU Cache for pointer-walk visualisation, Caching how-it-works for the full reference.

Caching strategies.

Why cache — the three reasons

The hit / miss cycle

Where to cache — six layers

1. Browser

2. CDN edge

3. Reverse proxy

4. Application memory

5. Distributed cache

6. Database buffer pool

Update strategies — the four patterns

1. Cache-aside (lazy loading)

2. Read-through

3. Write-through

4. Write-behind (write-back)

Eviction — when the cache is full

Cache key design

The hard cases

Practical defaults