A well-placed cache turns a 50ms disk-bound operation into a 0.1ms memory lookup. At scale it is not an optimisation — it is the line between systems that work and systems that fall over.
Caching is the highest-leverage way to cut latency and backend load. The mechanics are simple; the discipline is in knowing which layer to cache at, which pattern to use, what to evict, and how to read the hit ratio when something starts going wrong. Five parts: why cache, where to cache, how to keep it consistent, what to evict, and how to reason about hit ratio. By the end you can defend any caching decision in a review and tune a real Redis or Memcached deployment without surprise.
Why cache — the three reasons
Software caching applies the same idea CPU designers settled on decades ago: keep hot data in a faster tier than the source of truth. Every modern CPU has L1, L2, and L3 caches because main memory is too slow; every modern web service has Redis or Memcached because the database is too slow. The motivations divide cleanly into three.
- Latency
- A Redis lookup (~500 µs) versus a Postgres query (~5 ms) is a 10× difference. A page that issues 20 dependent queries goes from 100 ms to 10 ms — the threshold the eye perceives as "instant".
- Throughput
- Serving 90% of requests from cache leaves the database with 10% of the original load. Redis handles 100,000+ ops/sec on a single instance; Postgres struggles past 5,000 simple SELECTs/sec on the same hardware.
- Cost
- A 16 GB Redis instance on AWS ElastiCache is roughly $90/month. The equivalent throughput from RDS would be 5–10× the price. Caching is the most cost-effective performance win available.
The hit / miss cycle
The effectiveness of a cache is summarised by one number — the hit ratio — and decomposed into three failure modes.
| Outcome | What happened | What to do |
|---|---|---|
| Hit | The key existed and had not expired. | Nothing — this is the goal. |
| Cold miss | Cache was empty (just started, or the key has never been requested). | Acceptable — every system pays this cost on warmup. |
| Capacity miss | The cache evicted this key earlier because memory ran out. | Either grow the cache or change the eviction policy. |
| TTL miss | The key expired before the next request. | Increase TTL, or refresh proactively if the key is hot. |
A 95% hit ratio is the realistic target for read-heavy workloads (catalogs, profiles, configuration). Highly personalised data (per-user dashboards, real-time analytics) settles closer to 60–80%. Measuring hit ratio is non-negotiable. Redis exposes INFO stats with keyspace_hits and keyspace_misses; if your hit ratio drops below 80%, you have a key-design problem, your working set exceeds memory, or your TTLs are too aggressive — investigate before adding capacity.
Where to cache — six layers
"The cache" is plural. A modern stack caches at six distinct layers, each with its own invalidation rules and failure modes.
1. Browser
HTTP Cache-Control, ETag, service workers. Free; survives reloads; per-user.
2. CDN edge
Cloudflare, Fastly, CloudFront. Sub-50ms globally, shared across users, immutable assets.
3. Reverse proxy
Nginx, Varnish. Caches origin responses on your own infrastructure. Cheaper than CDN per byte.
4. Application memory
In-process LRU (Node, Java, Go). Fastest possible — no network — but per-instance and dies on restart.
5. Distributed cache
Redis, Memcached. Shared across instances; survives app restarts; the workhorse of most stacks.
6. Database buffer pool
Postgres shared_buffers, MySQL InnoDB pool. Implicit — caches recently read pages — tune for your working set.
Every layer above the database absorbs traffic before it arrives. The classic mistake is to add Redis without first checking whether HTTP cache headers and a CDN would have done the job for free. CDN guide covers layers 1–3 in depth; the rest of this page focuses on layers 4 and 5.
Update strategies — the four patterns
When the data behind your cache changes, what do you do? Four patterns dominate, and choosing among them is the call that defines how your cache behaves.
1. Cache-aside (lazy loading)
The application reads from the cache; on miss, queries the database, populates the cache, and returns. On write, invalidates (deletes) the cache entry — the next read will refetch.
function getUser(id) {
let user = cache.get(`user:${id}`);
if (user) return user; // hit
user = db.query(...); // miss → fetch
cache.setex(`user:${id}`, 300, user);
return user;
}
function updateUser(id, data) {
db.update(...);
cache.del(`user:${id}`); // invalidate
}
Simple, memory-efficient (only requested keys are cached), and graceful under cache failure (just slower). Used by the overwhelming majority of web apps. The trade-off: a brief staleness window between the database commit and the cache delete; a stale read in that window is possible. For most user-facing data this is acceptable. For payments and account balances it is not.
2. Read-through
The application reads only from the cache; the cache itself fetches from the database on miss. From the application's perspective, the cache is the data store. Found in object-relational mappers (Hibernate's second-level cache) and in some managed services (DynamoDB Accelerator).
3. Write-through
Writes go to the cache and the database synchronously. Strong consistency — the cache is never stale. Higher write latency (two stores per write). The pattern of choice when reads vastly outnumber writes and stale data is unacceptable.
4. Write-behind (write-back)
Writes go to the cache only; a background process flushes batches to the database. Lowest write latency, highest throughput. Risky — a cache crash before flush loses data. Used in metrics pipelines, click counters, and low-stakes high-volume writes; rarely the right answer for transactional state.
| Pattern | Read latency | Write latency | Consistency | Failure mode |
|---|---|---|---|---|
| Cache-aside | fast on hit, slow on miss | fast (write DB, delete cache) | brief staleness window | cache failure → graceful degradation |
| Read-through | same as cache-aside | fast (write DB) | same as cache-aside | cache layer must be available |
| Write-through | fast on hit | slow (write both) | strong | cache failure stalls writes |
| Write-behind | fast on hit | fastest | weak — flush window is data loss zone | cache crash loses unflushed writes |
Eviction — when the cache is full
Memory is finite. When the cache fills, it must drop something. The eviction policy decides which key dies; the choice changes the hit ratio dramatically.
- LRU
- Least-recently-used. Drops the key untouched longest. The default in Redis (
allkeys-lru), Memcached, and most application caches. Solid for general workloads. - LFU
- Least-frequently-used. Drops the key with the lowest hit count. Better than LRU when access patterns are bursty (some keys are very hot, others are touched once and forgotten).
- FIFO
- First-in-first-out. Drops the oldest entry. Rarely the best choice — ignores access frequency entirely.
- Random
- Drops a random key. Surprisingly competitive on uniform workloads and trivially cheap to implement.
- ARC
- Adaptive Replacement Cache (IBM, 2003). Tracks both recency and frequency; rebalances dynamically. Used in ZFS and PostgreSQL's buffer manager.
- W-TinyLFU
- Window TinyLFU (2017). Sliding-window approximate LFU, beats LRU and ARC on most modern workloads. Default in Caffeine (Java) and integrated into many production caches.
The Cache Eviction simulator compares these policies against the same access trace — drag the workload toward "Zipfian skew" (a few hot keys, many cold) and watch W-TinyLFU pull ahead.
Cache key design
Keys are an under-appreciated decision. A bad key scheme produces low hit ratios that no amount of capacity will fix.
- Include version. When schema changes, bump the version prefix:
user:v2:42. Old entries remain in the cache until evicted; new entries use the new schema. No flush required. - Normalise capitalisation. If your application sometimes queries
email:Ada@x.comand sometimesemail:ada@x.com, the cache treats them as different keys. Lowercase before lookup. - Avoid query-string keys. A cache key based on a sorted query string is fragile — parameter order, encoding, and case all break the hash. Build the key from the parsed values, not the URL.
- Tag for bulk invalidation. Group related keys with a tag (
posts:user:42) so invalidating a user's posts is one wildcard delete instead of N specific keys.
The hard cases
Three failure modes deserve their own names because they keep showing up.
user:99999999). Each miss hits the database and finds nothing — and the cache, by default, doesn't store the negative result. Subsequent identical requests miss again. Fix: cache the "not found" result with a short TTL; or use a Bloom filter to short-circuit known-absent keys before they reach the cache.Practical defaults
If you are starting from a blank deployment and need to ship today:
- Use cache-aside as the default pattern. Reach for write-through only when staleness is unacceptable.
- Use Redis for the distributed layer; Memcached only if you need raw throughput on a single use case and don't need its data structures.
- Set per-key TTLs rather than relying entirely on eviction. 5–60 minutes is the typical range; data with strong staleness requirements goes lower.
- Use LRU as the eviction policy unless you have measured reason to switch.
- Monitor hit ratio alongside latency. A latency improvement that drops hit ratio is suspicious.
- Add negative caching for high-cardinality lookups (user-by-email) where misses are expected to be common.
And keep the working tools nearby: Cache Eviction for policy comparison, LRU Cache for pointer-walk visualisation, Caching how-it-works for the full reference.
- SemicolonyCache Eviction simulator — LRU, LFU, ARC, TinyLFU compared on the same trace.
- SemicolonyLRU Cache simulator — watch the doubly-linked list mutate in real time.
- SemicolonyCaching · how it works — the long-form reference covering every layer.
- SemicolonyRedis · how it works — single-threaded event loop, persistence, cluster topology.
- SemicolonyCDN · how it works — caching at layers 1–3 (browser, edge, reverse proxy).
- PaperMegiddo & Modha · ARC: A Self-Tuning, Low Overhead Replacement Cache · 2003 — the original ARC paper.
- PaperEinziger, Friedman & Manes · TinyLFU: A Highly Efficient Cache Admission Policy · 2017 — the W-TinyLFU foundation.
- BlogMarc Brooker · Caches, Modes, and Unstable Systems · 2021 — how a cache going cold pushes a backend into a metastable failure mode.
- DocsRedis · Performance & Memory Optimization — eviction tuning, persistence, key-space design.
- DocsMDN · HTTP Caching — comprehensive reference for browser and CDN cache headers.
- BookGregg · Systems Performance · 2nd ed. 2020 — chapter 7 covers the latency hierarchy and CPU caches.