8 deep dives
Performance / Methods

The methods

Eight methods, each built around a single layer of the stack. USE looks at boxes, RED at services, top-down at CPU cycles, latency budgets at the chain of hops a request walks through. The order doesn't matter much; the match between method and symptom does. Most of the time when performance work doesn't move the dial, the method was inspecting the wrong layer.

8 methods · one shared template


Which method when

Every method below works on a specific layer. Start from the symptom; the symptom tells you the layer; the layer tells you the method. The table below is the short form of that mapping.

End-to-end request is too slow
→ Where is the time going in the chain?
One box is hot or unresponsive
→ Which hardware resource is saturated?
A service is slow or erroring
→ Rate, errors, or duration?
CPU at 100%, code is the bottleneck
→ Front-end stall, back-end stall, branch miss?
A compute kernel feels too slow
→ Memory-bound or compute-bound?
Sizing thread pools, queues, or connections
→ How does latency change with utilisation?
Need to find a real-world bottleneck
→ Capture production state with low overhead
Pre-launch validation under load
→ Reproduce real traffic without coordinated omission

The eight methods

Each method has the same shape — the layer it targets, the canonical reference (Gregg, Yasin, Williams, Kleppmann, Tene), the worked numbers, and the failure modes the method specifically catches that nothing else does.

  1. 01 · Request chain

    Latency budgets & percentiles

    Allocate the end-to-end deadline across each hop, then compare against P50/P95/P99/P99.9 of what each hop actually does. Covers the percentile-of-N trap, coordinated omission, and the tail-at-scale numbers from Dean & Barroso.

  2. 02 · Hardware resources

    The USE method

    Brendan Gregg's resource checklist: utilisation, saturation, errors — checked once per CPU, disk, NIC, and lock. Designed to find the bottleneck in the box, not the request.

  3. 03 · Services & APIs

    The RED method

    Rate, errors, duration — measured per service. The complement to USE: USE asks "is the box healthy?", RED asks "is the service healthy?". Together they cover both halves.

  4. 04 · CPU pipeline

    Top-down microarchitecture

    Ahmad Yasin's framework for finding pipeline stalls. Splits cycles into front-end, back-end, bad speculation, retiring — and tells you which one to chase. Runs on perf with toplev.py.

  5. 05 · Compute kernels

    The roofline model

    Plots arithmetic intensity (FLOPS per byte) against the compute roofline and the memory bandwidth roofline. Answers "memory-bound or compute-bound?" in one chart — and shows the headroom you actually have.

  6. 06 · Capacity sizing

    Queueing theory for engineers

    Little's Law, M/M/1, M/M/c, Pollaczek–Khinchine. Enough theory to size thread pools, connection pools, and queues correctly — and to know why latency goes superlinear long before utilisation hits 100%.

  7. 07 · Production process

    Profiling in production

    pprof, async-profiler, perf, eBPF, continuous profiling. Capturing real cache misses, real GC pauses, real lock contention — without taking the service down or distorting what you measure.

  8. 08 · Synthetic load

    Load testing without lying

    Open-loop vs closed-loop generators. Coordinated omission and why wrk2/k6 report the latency tail honestly while locust/JMeter routinely hide it.

Why methods, not just tools

There is a familiar failure mode in performance work: open the profiler, look at the flame graph, find a hot function, "optimise" it. The hot function was never the bottleneck. The real bottleneck was a lock further up the call stack, or a thread pool sized for half the load, or a cross-region RPC that ate the latency budget before any of this code ran.

A method is the discipline of asking the right question before reaching for a tool. USE, RED, and top-down each catch a class of bottleneck that the others are blind to. Latency budgets and queueing theory are how you reason about a system before it's even built. Profiling and load testing are how you confirm what the methods predicted. The tools change every few years; the methods are what carry across them.

The single useful rule. Match the method to the layer the symptom lives at. The box is hot → USE. The service is slow → RED. CPU is pinned and the work is real → top-down. The chain is too long → latency budgets. Sizing concurrency → queueing theory. Production reality → profiling and load testing.