The RED method
Three metrics per service: request rate, error rate, and the duration distribution. Tom Wilkie's pattern from his time at Weaveworks; now the default for the first dashboard you build on any new service. RED tells you whether a service is healthy from the request side; USE tells you whether the box is healthy from the resource side. Most production investigations need both.
Why three, and why these three
RED is small on purpose. The point is that three numbers per service, viewed together, cover almost every operational question someone might ask in a degraded-service moment. Building anything more elaborate first is usually premature, and a fleet where every service exposes a different ad-hoc set of metrics is a fleet nobody can reason about at three in the morning.
The three signals are not an arbitrary pick. They fall out of a simple model of what a service is: a thing that takes requests in, fails on some of them, and spends time on the rest. Rate measures how much work is arriving. Errors measures how much of that work is going wrong. Duration measures how long the work that succeeds is taking. There is no fourth thing a request can do — arrive, fail, or take time are the only three states a unit of work passes through from the caller's point of view. That is why the set feels complete in practice even though it has only three members: it covers the entire surface a client can observe.
Notice what RED deliberately leaves out. It says nothing about CPU, memory, disk, file descriptors, or any other resource inside the box. That omission is the whole design. RED is a request-side method; it describes the contract between a service and its callers, not the machinery that fulfils the contract. When a request is slow, RED tells you it is slow but not why; the "why" is a resource question, and resource questions belong to USE. Keeping the two views separate is what makes each one sharp. The moment you start cramming garbage-collection pauses and thread-pool depth into your RED dashboard, it stops being a clean statement of service health and becomes a grab-bag.
| Metric | Definition | Implementation |
|---|---|---|
| Rate | Requests per second handled by the service. Aggregate and broken out by endpoint, by status code class, by caller. | Prometheus rate(http_requests_total[1m]) |
| Errors | Number (or fraction) of requests that errored. Usually HTTP 5xx, gRPC non-OK, app-level error codes. | rate(http_requests_total{status=~"5.."}[1m]) |
| Duration | Latency distribution — at minimum P50, P95, P99. Histogram, not average. | histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])) |
Duration is a distribution, not a number
Of the three signals, duration is the one teams get wrong most often, and the mistake is almost always the same: they track the average. The average response time is one of the most misleading numbers in operations. It hides the shape of the distribution, and the shape is the whole story. A service whose average is a comfortable 40 ms can still be timing out one request in fifty, because the average happily absorbs a handful of ten-second responses among thousands of fast ones. The user who hit the ten-second response does not feel an average. They feel ten seconds.
Percentiles fix this by describing the distribution at named points. P50, the median, is the experience of a typical request. P95 is the experience of the worst one in twenty. P99 is the worst one in a hundred. These are not academic distinctions. On a service handling a thousand requests a second, the P99 is ten slow requests every second, every second — a steady drip of unhappy users that an average will never show you. And because real production traffic is almost never symmetric, the median and the tail can live in completely different worlds: a median of 30 ms sitting under a P99 of 1.2 seconds is an entirely normal and entirely alarming shape.
This is also why duration must be collected as a histogram rather than a
pre-computed percentile. If each instance of a service exports its own P99, you
cannot average those P99s together to get the fleet P99 — percentiles do not
add up that way, and averaging them produces a number that is wrong in a
direction you cannot predict. A histogram, by contrast, is a set of bucket
counts, and bucket counts do add. You sum the histograms across every
instance and compute the quantile once, at query time, on the combined
distribution. That is exactly what histogram_quantile does in the
queries above, and it is the technical reason RED dashboards are built on
_bucket series rather than gauges that report a percentile directly.
The cost of this is that a histogram's accuracy is bounded by its bucket layout. A percentile is interpolated within whichever bucket it falls into, so if your buckets jump from 100 ms straight to 1 second, every P99 between those two values reports as roughly the same blurry number. The fix is to put bucket boundaries where you actually care about precision — usually clustered around your SLO threshold — and accept coarse resolution out in the far tail where the exact value matters less than the fact that it is large.
RED and the four golden signals
Google's SRE book uses a slightly different list — the "four golden signals": latency, traffic, errors, saturation. Saturation is the extra one; it's roughly USE's saturation, brought up to the service level. In practice the choice doesn't matter much: start with RED, add saturation if your service has a clear queue (a thread pool, a connection pool, a buffer) where depth is a leading indicator.
| RED (Wilkie) | Four Golden Signals (Google SRE) | |
|---|---|---|
| Rate | ✓ | Traffic |
| Errors | ✓ | Errors |
| Duration | ✓ | Latency |
| Saturation | — | ✓ |
Both methods make the same point: a handful of consistent metrics across every service is more useful than a deep custom set on a single service. The second any new service ships, it should expose those metrics; that's how a fleet stays comprehensible.
The honest way to think about the relationship is that RED is the three golden signals you can always measure from outside the service, and saturation is the fourth one you can only measure from inside it. Rate, errors, and duration are all visible at the edge — a load balancer, a sidecar, or the caller itself can record every one of them without knowing anything about the service's internals. Saturation is different; it asks "how full is the most constrained resource," which requires knowing what the resources are. That is why saturation tends to migrate over to the USE side of the wall in practice, and why RED stops at three. Add saturation back into a RED dashboard only when the service has one obvious queue whose depth predicts trouble before errors or latency move — a thread pool, a connection pool, a message backlog. If there is no single dominant queue, saturation is better left to a USE view of the host.
RED versus USE: two views of the same incident
RED and USE are often presented as competitors. They are not. They answer different questions about the same system, and a real investigation usually crosses from one to the other. RED is the request-side view: it watches the contract between a service and its callers and tells you whether that contract is being honoured. USE is the resource-side view: it watches the CPU, memory, disk, and network of the box and tells you whether any resource is the bottleneck. RED notices that requests are slow; USE explains that the run queue is deep because the CPU is pegged. One is the symptom, the other is the cause.
The practical workflow follows the arrow. When a page fires, you open the service's RED dashboard first, because the request side is where user impact lives and where the symptom is unambiguous. RED narrows the problem: is more work arriving, is more of it failing, or is the same work taking longer? If the answer is "the same traffic is suddenly slower," that is your cue to cross the wall and bring up USE on the host — now you are looking for the saturated resource that explains the extra latency. If the answer is "more requests are failing at flat traffic," you skip USE entirely and go to the dependency that the errors point at. RED tells you which door to walk through; USE is what is behind one of the doors.
| RED | USE | |
|---|---|---|
| Vantage point | Request side (the caller's view) | Resource side (the box's view) |
| Unit of analysis | A service or endpoint | A resource: CPU, memory, disk, NIC |
| Question it answers | Is the service honouring its contract? | Is any resource the bottleneck? |
| Signals | Rate, errors, duration | Utilisation, saturation, errors |
| Best for | Detecting user-facing impact | Explaining where the impact comes from |
Building the dashboard
A RED dashboard for a single service is six panels, each in two views (overall and broken out by route). It takes about an hour to build and pays for itself the first time something goes wrong.
# Six standard panels for a Prometheus-backed RED dashboard.
# 1. Request rate by status class
sum by (status_class) (rate(http_requests_total[1m]))
# 2. Error rate
sum(rate(http_requests_total{status=~"5.."}[1m]))
/
sum(rate(http_requests_total[1m]))
# 3. Latency P50, P95, P99
histogram_quantile(0.50, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
# 4. Rate by route (top 5)
topk(5, sum by (route) (rate(http_requests_total[1m])))
# 5. Errors by route (top 5)
topk(5, sum by (route) (rate(http_requests_total{status=~"5.."}[1m])))
# 6. Latency by route (P99 only, top 5)
topk(5, histogram_quantile(0.99,
sum by (le, route) (rate(http_request_duration_seconds_bucket[5m]))))One thing worth being deliberate about: the histogram bucket layout. Wide
buckets (powers of two from 1 ms to 30 s) give cheap storage and acceptable
percentile accuracy. Narrow buckets give better accuracy but higher
cardinality. The Prometheus default — 0.005, 0.01, 0.025, ... 10
— is a reasonable starting point for human-facing APIs; service-to-service
workloads with sub-millisecond latency need a custom set.
A few layout choices repay the effort of getting them right early. Put the two
route-broken-out panels next to their aggregate counterpart so the eye can drop
from "the service is slow" to "this one route is slow" without scrolling. Use
topk on the per-route panels so a service with two hundred
endpoints does not draw two hundred lines; the five busiest, or the five
worst, are what you want during an incident, and a long tail of flat lines just
adds noise. Keep the time ranges on the rate panels short — a one-minute
rate() window reacts quickly enough to show a traffic spike as it
happens — while the duration panels can use a longer five-minute window,
because percentile estimates over a histogram are noisy and a longer window
smooths them without hiding a real shift.
Resist the urge to add panels. The discipline of RED is that the dashboard stays small enough to read in one glance under stress. Every extra panel is one more thing the oncall has to scan past at the worst possible moment. If a service needs a deeper view — a cache hit ratio, a queue depth, a downstream call breakdown — put it on a second, linked dashboard, and keep the RED page as the front door that everyone opens first.
Interpreting a RED dashboard
Three patterns cover most degraded-service moments. Recognising them on sight is most of what makes oncall fast.
| Pattern | What you see | Usual cause |
|---|---|---|
| Rate up, errors up, duration up | Traffic spike. Errors and tail latency follow. | Real traffic increase or retry storm. Check upstream first; back-pressure may be needed. |
| Rate flat, errors up, duration flat | Same volume of requests, suddenly more failing. | Downstream dependency degraded or returning bad data. Usually database or auth service. |
| Rate flat, errors flat, duration up | Same traffic, same success rate, just slower. | Resource saturation on the service host (CPU, GC, lock contention) or on a hidden dependency. Switch to USE for the host. |
| Rate down, errors down or up, duration flat | Less traffic reaching the service. | Something further upstream is dropping requests — load balancer, gateway, DNS. Check from the outside. |
RED in a service mesh
One of the practical reasons service meshes (Istio, Linkerd, Cilium) have gained traction is that they emit RED metrics automatically for every service, without the service team writing instrumentation. Every sidecar records rate, errors, and duration for inbound and outbound calls; the mesh control plane exposes the result as Prometheus metrics with consistent labels.
The trade-off is that the metrics come from the sidecar's perspective, not the application's. Latency measured at the sidecar excludes time spent inside the application process; for most operational use that's fine, but when comparing to an SLO it's worth knowing whether you're measuring service-as-the-network-sees-it or service-as-the-process-sees-it.
RED for non-request workloads
RED is shaped around request/response services, but the same three columns work for other workload types with small adaptations:
| Workload | Rate | Errors | Duration |
|---|---|---|---|
| HTTP/gRPC service | Requests/sec | Failed responses | Per-request latency |
| Queue consumer | Messages consumed/sec | Failed processing | Processing time + queue wait |
| Batch job | Items processed/sec | Items failed | End-to-end batch duration |
| Streaming pipeline | Events/sec | Drops + dead-letter sends | End-to-end event latency |
| Database query path | Queries/sec | Query failures (timeouts, syntax) | Query duration histogram |
The pattern works because every system has work coming in, some of that work failing, and the rest taking some amount of time. Once you have those three views, most operational questions resolve to "which one moved, and when?"
Where RED goes wrong
RED is simple, which makes it easy to implement badly. Most of the failures are not in the idea but in the instrumentation, and they share a theme: a metric that looks fine on the dashboard but answers the wrong question or cannot be aggregated. Three traps account for nearly all of them.
Averaging the duration. This is the one that bites hardest. A team exports a mean response time, or worse, exports per-instance percentiles and then averages those across instances. The mean hides the tail, as the histogram section above showed; averaged percentiles are mathematically meaningless because percentiles do not combine by averaging. The symptom is a dashboard that stays green while users complain, because the number on the panel describes a request that almost nobody actually experienced. The fix is structural, not cosmetic: collect duration as a histogram, aggregate the bucket counts, and compute the quantile once over the combined distribution.
No error taxonomy. Counting "errors" as a single number treats a client sending a malformed request the same as the database being down. Those are completely different incidents — one is the caller's fault and needs no action, the other is yours and needs a page — and a single error counter folds them together. A useful error signal is broken out by class: 4xx (the caller did something wrong) kept separate from 5xx (the service did), and ideally tagged with an application-level reason. The classic false alarm is a deploy that ships a stricter validation rule, 4xx rate jumps, the error panel goes red, and the oncall burns twenty minutes before realising nothing is actually broken. Without a taxonomy, every error spike looks like an outage.
Cardinality. The labels that make RED metrics useful — route,
status, method, caller — are also what can blow up your metrics store. Every
distinct combination of label values is a separate time series, and a histogram
multiplies that by its bucket count. Put a high-cardinality value in a label —
a raw URL path with embedded IDs, a user identifier, a request ID — and you
create a near-infinite number of series, each carrying a full set of histogram
buckets. The store slows down, queries time out, and the bill climbs. The
discipline is to label only with bounded, low-cardinality dimensions: a
templated route (/users/:id, never /users/8412), a
status class, a method. Anything unbounded belongs in a trace or a log, not a
metric label. This is one of the clean dividing lines in the
logs, metrics, and traces
model: metrics are for bounded aggregate signals, traces and logs are for the
high-cardinality detail you reach for once a metric has told you where to look.
From RED to SLOs and budgets
RED is also the raw material for service-level objectives. An SLO is just a target drawn on top of the RED signals: "99.9% of requests succeed" is an error-rate target, and "95% of requests complete under 200 ms" is a duration target read straight off the histogram. Because the duration signal is already a distribution, you can express latency objectives the honest way — as a percentile under a threshold — instead of the misleading "average under X" that an average-based metric forces on you. The error budget that falls out of an SLO is then a direct function of the error and duration panels you are already watching, which is why a well-built RED dashboard is most of the work of running SLOs. The threshold-setting, the burn-rate alerting, and the way percentile budgets compose across a request path are covered on the latency budgets and percentiles page; RED is where the numbers those budgets are built on come from.
Production checklist
- Every service exposes rate, errors, duration. Same metric names across the fleet (
http_requests_total,http_request_duration_seconds_bucket). Consistency is what makes the fleet legible. - Duration is always a histogram. Never a mean. P50, P95, P99 at minimum.
- Six-panel RED dashboard for each service. Three aggregate (rate, errors, P50/P95/P99), three broken out by route (rate, errors, P99).
- Alert on errors and duration, not rate. Rate spikes are normal; sustained error-rate or latency breaches are not.
- Pair with USE when you suspect the host. RED diagnoses the service; USE diagnoses the box underneath.
- Add saturation when there's an obvious queue. Thread-pool depth, connection-pool wait, message-queue backlog. Worth tracking when one exists.
- Histogram bucket layout fits the workload. Default Prometheus buckets work for human-facing APIs; service-to-service or sub-millisecond workloads need custom buckets.
- Mesh-emitted metrics are convenient but not authoritative. They measure at the sidecar; for SLO-grade tracking, instrument inside the application too.
Further reading
- Tom Wilkie — "The RED Method: Key Metrics for Microservices Architecture". The original write-up. Short; worth the original.
- Google SRE Book — "Monitoring Distributed Systems" (Chapter 6). The four golden signals as Google practises them. Free online.
- Prometheus documentation — Histograms and summaries. The bucket-design choice in detail.
- Grafana — RED dashboard templates. The standard six-panel layout, ready to import.
- Adjacent: The USE method. The box-level companion.
- Adjacent: Latency budgets & percentiles. The reasoning behind why duration is a distribution.