The methods

Eight methods, each built around a single layer of the stack. USE looks at boxes, RED at services, top-down at CPU cycles, latency budgets at the chain of hops a request walks through. The order doesn't matter much; the match between method and symptom does. Most of the time when performance work doesn't move the dial, the method was inspecting the wrong layer.

8 methods · one shared template

Which method when

Every method below works on a specific layer. Start from the symptom; the symptom tells you the layer; the layer tells you the method. The table below is the short form of that mapping.

End-to-end request is too slow

→ Where is the time going in the chain?

01 · Latency budgets & percentiles

One box is hot or unresponsive

→ Which hardware resource is saturated?

02 · The USE method

A service is slow or erroring

→ Rate, errors, or duration?

03 · The RED method

CPU at 100%, code is the bottleneck

→ Front-end stall, back-end stall, branch miss?

04 · Top-down microarchitecture or 07 · Profiling in production

A compute kernel feels too slow

→ Memory-bound or compute-bound?

05 · The roofline model

Sizing thread pools, queues, or connections

→ How does latency change with utilisation?

06 · Queueing theory for engineers

Need to find a real-world bottleneck

→ Capture production state with low overhead

07 · Profiling in production

Pre-launch validation under load

→ Reproduce real traffic without coordinated omission

08 · Load testing without lying

The eight methods

Each method has the same shape — the layer it targets, the canonical reference (Gregg, Yasin, Williams, Kleppmann, Tene), the worked numbers, and the failure modes the method specifically catches that nothing else does.

Why methods, not just tools

There is a familiar failure mode in performance work: open the profiler, look at the flame graph, find a hot function, "optimise" it. The hot function was never the bottleneck. The real bottleneck was a lock further up the call stack, or a thread pool sized for half the load, or a cross-region RPC that ate the latency budget before any of this code ran.

A method is the discipline of asking the right question before reaching for a tool. USE, RED, and top-down each catch a class of bottleneck that the others are blind to. Latency budgets and queueing theory are how you reason about a system before it's even built. Profiling and load testing are how you confirm what the methods predicted. The tools change every few years; the methods are what carry across them.

The single useful rule. Match the method to the layer the symptom lives at. The box is hot → USE. The service is slow → RED. CPU is pinned and the work is real → top-down. The chain is too long → latency budgets. Sizing concurrency → queueing theory. Production reality → profiling and load testing.

Start here

Latency budgets & percentiles

The interactive deep dive — drag the layer budgets, watch the deadline overflow, walk through why P99-of-N isn't the system's P99 and why coordinated omission silently halves your reported tail latency.

Read the method