The methods
Eight methods, each built around a single layer of the stack. USE looks at boxes, RED at services, top-down at CPU cycles, latency budgets at the chain of hops a request walks through. The order doesn't matter much; the match between method and symptom does. Most of the time when performance work doesn't move the dial, the method was inspecting the wrong layer.
8 methods · one shared template
Which method when
Every method below works on a specific layer. Start from the symptom; the symptom tells you the layer; the layer tells you the method. The table below is the short form of that mapping.
The eight methods
Each method has the same shape — the layer it targets, the canonical reference (Gregg, Yasin, Williams, Kleppmann, Tene), the worked numbers, and the failure modes the method specifically catches that nothing else does.
- 01 · Request chain
Latency budgets & percentiles
Allocate the end-to-end deadline across each hop, then compare against P50/P95/P99/P99.9 of what each hop actually does. Covers the percentile-of-N trap, coordinated omission, and the tail-at-scale numbers from Dean & Barroso.
- 02 · Hardware resources
The USE method
Brendan Gregg's resource checklist: utilisation, saturation, errors — checked once per CPU, disk, NIC, and lock. Designed to find the bottleneck in the box, not the request.
- 03 · Services & APIs
The RED method
Rate, errors, duration — measured per service. The complement to USE: USE asks "is the box healthy?", RED asks "is the service healthy?". Together they cover both halves.
- 04 · CPU pipeline
Top-down microarchitecture
Ahmad Yasin's framework for finding pipeline stalls. Splits cycles into front-end, back-end, bad speculation, retiring — and tells you which one to chase. Runs on perf with toplev.py.
- 05 · Compute kernels
The roofline model
Plots arithmetic intensity (FLOPS per byte) against the compute roofline and the memory bandwidth roofline. Answers "memory-bound or compute-bound?" in one chart — and shows the headroom you actually have.
- 06 · Capacity sizing
Queueing theory for engineers
Little's Law, M/M/1, M/M/c, Pollaczek–Khinchine. Enough theory to size thread pools, connection pools, and queues correctly — and to know why latency goes superlinear long before utilisation hits 100%.
- 07 · Production process
Profiling in production
pprof, async-profiler, perf, eBPF, continuous profiling. Capturing real cache misses, real GC pauses, real lock contention — without taking the service down or distorting what you measure.
- 08 · Synthetic load
Load testing without lying
Open-loop vs closed-loop generators. Coordinated omission and why wrk2/k6 report the latency tail honestly while locust/JMeter routinely hide it.
Why methods, not just tools
There is a familiar failure mode in performance work: open the profiler, look at the flame graph, find a hot function, "optimise" it. The hot function was never the bottleneck. The real bottleneck was a lock further up the call stack, or a thread pool sized for half the load, or a cross-region RPC that ate the latency budget before any of this code ran.
A method is the discipline of asking the right question before reaching for a tool. USE, RED, and top-down each catch a class of bottleneck that the others are blind to. Latency budgets and queueing theory are how you reason about a system before it's even built. Profiling and load testing are how you confirm what the methods predicted. The tools change every few years; the methods are what carry across them.