Back-pressure, retries, hedging, deadlines
The four primitives that turn an unreliable network of services into a system that holds up under load. They are usually taught one at a time. They are always used together, and each one only does its job when the other three are in place. This is the single most common system-design losing topic, because people often handle retries, gesture at back-pressure, and forget hedging and deadlines entirely.
Why one isn't enough
Each primitive solves a different failure mode. Skip any one and you reproduce that failure reliably under load.
| Primitive | Solves | Failure mode if skipped |
|---|---|---|
| Back-pressure | "My downstream is overloaded" | Cascading failure. The queue grows, latency rises, callers retry, the queue grows faster, the system collapses. |
| Retries | "This single request flaked" | 5xx spikes from transient errors that would have succeeded on the second try. |
| Hedging | "This one slow replica is killing my P99" | Tail latency dominated by the slowest 1% of replicas. P99 = max of all dependencies. |
| Deadlines | "Work that already missed its budget shouldn't keep running" | Wasted work. The system finishes a request the user already gave up on, while real traffic waits behind it. |
1 · Back-pressure
Back-pressure is the signal a downstream sends upstream to say "I can't handle more". Without it, work piles up at the slowest stage while the rest of the pipeline keeps producing.
The mechanism, in three flavours
| Mechanism | How | Example |
|---|---|---|
| Bounded queue + 503 | Queue has a max depth. When full, server returns 503 immediately. | Most HTTP services. The "shed early" pattern. |
| Reactive Streams / token credit | Consumer signals N more items it can handle; producer never sends more than that. | RSocket, gRPC streaming, Akka Streams. Inherently flow-controlled. |
| TCP-style window | Receiver advertises a window; sender stops at the window edge. | HTTP/2 flow control, QUIC, raw TCP. Hardware-cheap, well-understood. |
Diagram — back-pressure in action
The signal must propagate
A bounded queue at the producer is local back-pressure. Useful, but not enough. The full pattern pushes the signal upstream until it reaches a layer that can either delay (rate-limited client) or shed (drop the request).
- Service A → Service B → Service C. If C is overloaded, B should slow A, not absorb C's slowness silently.
- Caller-side timeout is not back-pressure. A 30-second timeout means the client gives up; the server still does the work. Real back-pressure stops the work.
- Reject early, reject loud. 503 with a
Retry-Afterheader is the polite back-pressure signal. It tells the caller's retry policy to wait the right amount.
2 · Retries — and the budget that keeps them safe
A retry policy without a budget is a denial-of-service tool aimed at your own infrastructure. The pattern that holds up:
- Exponential backoff with jitter. Wait
random(0, base × 2n)before retry n. Without jitter, retries synchronise on the next-second tick and re-saturate the server. - Retry only idempotent operations. GET and PUT yes. POST only with an idempotency key. Read the original Stripe writeup; it's the canonical reference.
- Retry budget. A fraction (e.g., 10%) of recent successful requests. Once budget is exhausted, retries are dropped — the system gives up rather than amplifying.
- Don't retry 5xx blindly. Distinguish 503 (retryable) from 500 (probably a bug; retry once at most).
- The retry chain is cumulative. Three layers each retrying 3× = up to 27 calls per logical request. Set the budget at the top of the chain; layers below should not retry independently.
Exponential backoff with jitter — three flavours
# Pure exponential — bad: every retry synchronises
sleep = base * (2 ** attempt)
# Equal jitter — better: still synchronises somewhat
sleep = base * (2 ** attempt) / 2 + random(0, base * (2 ** attempt) / 2)
# Full jitter — best: the AWS recommendation
sleep = random(0, min(cap, base * (2 ** attempt)))
# Decorrelated jitter — best when attempts are highly variable
prev = max(prev, base) # carry between attempts
sleep = random(base, min(cap, prev * 3))3 · Hedging
Hedging sends the same request to a second replica after a short delay, returns whichever responds first, and cancels the loser. It cuts P99 sharply when the cause of P99 is one slow replica (cold cache, GC pause, network micro-blip).
The mechanic
When hedging earns its keep
| Property | Why it matters |
|---|---|
| Idempotent | Both requests can run safely. Hedging non-idempotent operations is a duplicate-write bug. |
| Cancelable | The loser must be cancelable; otherwise hedging just doubles work. |
| P99 ≫ P50 | Hedging only helps when the slow tail is much slower than the median. If everything is uniformly slow, hedging doesn't help. |
| Hedge delay = P95 of the request | Don't hedge at P50 — you'll fire on half of all requests, doubling QPS. Hedge at the tail. |
4 · Deadlines & deadline propagation
Every request has a deadline: a time after which the result is no longer useful. The deadline must travel down the call chain so every service knows how long it has left.
Why a per-hop timeout isn't enough
Service A times out at 1 s. Service A calls B with a 500 ms timeout. B calls C with a 200 ms timeout. Looks safe. Now the user-visible request comes in with only 100 ms left, and B still uses its 500 ms, so the work runs for 400 ms past the point the user gave up.
Deadline as wall-clock, not duration
Pass an absolute time (epoch ms), not "you have 200 ms". Each service's local clock, plus skew, gives a consistent answer to "how much budget is left right now".
# Caller (gateway)
deadline = now + 1000ms # 1 s SLA
call B(req, deadline)
# Service B
remaining = deadline - now # ms left when B starts
if remaining < 50: # not enough budget for B's work
return DeadlineExceeded # short-circuit; don't even try
call C(req, deadline) # propagate the same deadline
# Service C
remaining = deadline - now
if remaining < 5:
return DeadlineExceeded
do work; return resultDeadline-aware shedding
A queued request with deadline already past is wasted work. Drop it on dequeue:
- Dispatcher shed. When pulling from the queue, check
now > deadline - work_estimate. If true, fail it now without doing the work. - Cancel on disconnect. If the client TCP connection closes (they gave up), abort downstream work.
- Mid-work check. Long-running work checks the deadline at coarse points (every 50 ms) and aborts early if exceeded.
The four together — a worked example
A request enters the gateway. The gateway's deadline is now+500 ms. It calls service A; A calls B; B calls C. Here's how all four primitives work together under normal and degraded conditions.
| Step | Normal load | C is slow / overloaded |
|---|---|---|
| Gateway → A | deadline=500 ms; success at 50 ms | deadline=500 ms |
| A → B | deadline propagated; A's retry budget=10% | same; A may retry once if budget allows |
| B → C (call 1) | 20 ms response | 200 ms — slower than usual |
| B → C (hedge) | not fired (under P95) | at 100 ms B fires hedge to a different replica |
| Hedge result | — | second replica returns at 130 ms |
| Back-pressure signal from C | "can take 50 more" (credit) | 503 / shed; A's circuit breaker may open |
| A's retry behaviour | not needed | blocked by retry budget after first 503; system gives up cleanly |
| Gateway sees | 200 ms response | 503 with Retry-After at the deadline → caller backs off |
The four together: under degraded conditions the system loses some requests cleanly, with the user seeing an honest 503, instead of cascading into a brown-out where every layer queues, every retry amplifies, every wasted request crowds out real work, and recovery takes tens of minutes.
Real-world references
| System | What they did | Lesson |
|---|---|---|
| Netflix Hystrix → resilience4j | Circuit breaker + thread-pool isolation + fallback. The per-dependency thread pool is a back-pressure mechanism. | Isolation per dependency; fail-fast over fail-late. |
| Google's "Tail at Scale" paper | Hedging at scale; tied requests with cancellation; backup-but-cancel pattern. | P99 reduction 30–50% with < 5% extra QPS. |
| AWS retry library | Token bucket per (region, service); decorrelated jitter; SDK-level enforcement. | Standardise retry across the SDK; don't let services reinvent it. |
| gRPC deadlines | Deadlines propagate through metadata automatically; cancellation built-in. | Use a runtime that does this for you. Manual deadline plumbing is error-prone. |
| Stripe API retries | Idempotency-Key header at the public API; behind it, a retry-safe execution log. | Idempotency must be designed into the API surface, not bolted on later. |
| Envoy / Linkerd | Service-mesh-level retry budgets and deadlines, applied uniformly. | Push these primitives into the mesh; application code handles the happy path. |
The interview answer that lands
In a system-design round, the question "what happens if the downstream is slow" is the cue. The candidate who passes ties the four together:
- Back-pressure first. "The downstream's bounded queue and 503 with Retry-After tell us early."
- Retries with budget. "Caller retries with full-jitter backoff, capped at 10% of recent success rate."
- Hedging on the tail. "Hedge requests to a second replica fire at P95 of the latency distribution; budget caps hedge rate at 5%."
- Deadlines wrap everything. "Per-request deadline propagates down the call chain. Mid-work checks abort when deadline is missed; queue drains drop expired work."
And the closer: "These four are why the system degrades cleanly instead of cascading into brown-out." That one sentence is what interviewers score against and what production teams actually rely on.
Further reading
- Dean & Barroso — "The Tail at Scale" (CACM 2013). The hedging paper. Required reading. ~6 pages; reads like a manual.
- Marc Brooker — "Exponential Backoff and Jitter" (AWS Architecture Blog). The retry-jitter math, with simulations. Essential.
- Ben Christensen — "Hystrix Tutorial" / Netflix Tech Blog. The circuit breaker + thread-pool isolation pattern.
- Marc Brooker — "Caution: Decreasing Variance Can Increase Demand". The subtle interaction between coalescing and back-pressure; counter-intuitive but matters at scale.
- gRPC docs — "Deadlines" and "Cancellation". The reference implementation for deadline propagation.
- Stripe Engineering — "Idempotency Keys at Stripe". The retry-safe API design pattern.
- Adjacent: Distributed rate limiter. The flip side — back-pressure as the abuse-handling tool.
- Adjacent: Circuit breaker. The pattern that fails-fast when retries become unsafe.
- Adjacent: Timeouts. The mechanism deadlines build on.