Synthesis

Topic / synthesis

Back-pressure, retries, hedging, deadlines

The four primitives that turn an unreliable network of services into a system that holds up under load. They are usually taught one at a time. They are always used together, and each one only does its job when the other three are in place. This is the single most common system-design losing topic, because people often handle retries, gesture at back-pressure, and forget hedging and deadlines entirely.

Why one isn't enough

Each primitive solves a different failure mode. Skip any one and you reproduce that failure reliably under load.

Primitive	Solves	Failure mode if skipped
Back-pressure	"My downstream is overloaded"	Cascading failure. The queue grows, latency rises, callers retry, the queue grows faster, the system collapses.
Retries	"This single request flaked"	5xx spikes from transient errors that would have succeeded on the second try.
Hedging	"This one slow replica is killing my P99"	Tail latency dominated by the slowest 1% of replicas. P99 = max of all dependencies.
Deadlines	"Work that already missed its budget shouldn't keep running"	Wasted work. The system finishes a request the user already gave up on, while real traffic waits behind it.

The pathological cycle. Retries without back-pressure cause retry storms. Back-pressure without retries causes lost work. Hedging without deadlines doubles your QPS forever. Deadlines without back-pressure mean every overloaded service still serves requests that already timed out. Each one needs the others to stay safe.

1 · Back-pressure

Back-pressure is the signal a downstream sends upstream to say "I can't handle more". Without it, work piles up at the slowest stage while the rest of the pipeline keeps producing.

The mechanism, in three flavours

Mechanism	How	Example
Bounded queue + 503	Queue has a max depth. When full, server returns 503 immediately.	Most HTTP services. The "shed early" pattern.
Reactive Streams / token credit	Consumer signals N more items it can handle; producer never sends more than that.	RSocket, gRPC streaming, Akka Streams. Inherently flow-controlled.
TCP-style window	Receiver advertises a window; sender stops at the window edge.	HTTP/2 flow control, QUIC, raw TCP. Hardware-cheap, well-understood.

Diagram — back-pressure in action

The signal must propagate

A bounded queue at the producer is local back-pressure. Useful, but not enough. The full pattern pushes the signal upstream until it reaches a layer that can either delay (rate-limited client) or shed (drop the request).

Service A → Service B → Service C. If C is overloaded, B should slow A, not absorb C's slowness silently.
Caller-side timeout is not back-pressure. A 30-second timeout means the client gives up; the server still does the work. Real back-pressure stops the work.
Reject early, reject loud. 503 with a Retry-After header is the polite back-pressure signal. It tells the caller's retry policy to wait the right amount.

2 · Retries — and the budget that keeps them safe

A retry policy without a budget is a denial-of-service tool aimed at your own infrastructure. The pattern that holds up:

Exponential backoff with jitter. Wait random(0, base × 2ⁿ) before retry n. Without jitter, retries synchronise on the next-second tick and re-saturate the server.
Retry only idempotent operations. GET and PUT yes. POST only with an idempotency key. Read the original Stripe writeup; it's the canonical reference.
Retry budget. A fraction (e.g., 10%) of recent successful requests. Once budget is exhausted, retries are dropped — the system gives up rather than amplifying.
Don't retry 5xx blindly. Distinguish 503 (retryable) from 500 (probably a bug; retry once at most).
The retry chain is cumulative. Three layers each retrying 3× = up to 27 calls per logical request. Set the budget at the top of the chain; layers below should not retry independently.

Exponential backoff with jitter — three flavours

# Pure exponential — bad: every retry synchronises
sleep = base * (2 ** attempt)

# Equal jitter — better: still synchronises somewhat
sleep = base * (2 ** attempt) / 2 + random(0, base * (2 ** attempt) / 2)

# Full jitter — best: the AWS recommendation
sleep = random(0, min(cap, base * (2 ** attempt)))

# Decorrelated jitter — best when attempts are highly variable
prev = max(prev, base) # carry between attempts
sleep = random(base, min(cap, prev * 3))

Marc Brooker's "Exponential Backoff and Jitter" is required reading. Skip it only if you've already learned that pure exponential backoff doesn't work. Most teams haven't.

3 · Hedging

Hedging sends the same request to a second replica after a short delay, returns whichever responds first, and cancels the loser. It cuts P99 sharply when the cause of P99 is one slow replica (cold cache, GC pause, network micro-blip).

The mechanic

When hedging earns its keep

Property	Why it matters
Idempotent	Both requests can run safely. Hedging non-idempotent operations is a duplicate-write bug.
Cancelable	The loser must be cancelable; otherwise hedging just doubles work.
P99 ≫ P50	Hedging only helps when the slow tail is much slower than the median. If everything is uniformly slow, hedging doesn't help.
Hedge delay = P95 of the request	Don't hedge at P50 — you'll fire on half of all requests, doubling QPS. Hedge at the tail.

Hedging without a budget = 2× QPS forever. Cap the hedge rate at some fraction (5–10%) of total RPS. Past that, drop extra hedges. Google's Tail at Scale paper is the foundational reference; their numbers say a 5% hedge budget cuts P99 latency in half without measurably raising load.

4 · Deadlines & deadline propagation

Every request has a deadline: a time after which the result is no longer useful. The deadline must travel down the call chain so every service knows how long it has left.

Why a per-hop timeout isn't enough

Service A times out at 1 s. Service A calls B with a 500 ms timeout. B calls C with a 200 ms timeout. Looks safe. Now the user-visible request comes in with only 100 ms left, and B still uses its 500 ms, so the work runs for 400 ms past the point the user gave up.

Deadline as wall-clock, not duration

Pass an absolute time (epoch ms), not "you have 200 ms". Each service's local clock, plus skew, gives a consistent answer to "how much budget is left right now".

# Caller (gateway)
deadline = now + 1000ms # 1 s SLA
call B(req, deadline)

# Service B
remaining = deadline - now # ms left when B starts
if remaining < 50: # not enough budget for B's work
 return DeadlineExceeded # short-circuit; don't even try
call C(req, deadline) # propagate the same deadline

# Service C
remaining = deadline - now
if remaining < 5:
 return DeadlineExceeded
do work; return result

Deadline-aware shedding

A queued request with deadline already past is wasted work. Drop it on dequeue:

Dispatcher shed. When pulling from the queue, check now > deadline - work_estimate. If true, fail it now without doing the work.
Cancel on disconnect. If the client TCP connection closes (they gave up), abort downstream work.
Mid-work check. Long-running work checks the deadline at coarse points (every 50 ms) and aborts early if exceeded.

Deadlines short-circuit retries safely. A retry policy that fires five attempts is fine when each attempt has 200 ms budget left. When the deadline is 50 ms away, the policy must stop. Retries that don't honour deadlines amplify load past the point the user gave up, the worst kind of waste.

The four together — a worked example

A request enters the gateway. The gateway's deadline is now+500 ms. It calls service A; A calls B; B calls C. Here's how all four primitives work together under normal and degraded conditions.

Step	Normal load	C is slow / overloaded
Gateway → A	deadline=500 ms; success at 50 ms	deadline=500 ms
A → B	deadline propagated; A's retry budget=10%	same; A may retry once if budget allows
B → C (call 1)	20 ms response	200 ms — slower than usual
B → C (hedge)	not fired (under P95)	at 100 ms B fires hedge to a different replica
Hedge result	—	second replica returns at 130 ms
Back-pressure signal from C	"can take 50 more" (credit)	503 / shed; A's circuit breaker may open
A's retry behaviour	not needed	blocked by retry budget after first 503; system gives up cleanly
Gateway sees	200 ms response	503 with Retry-After at the deadline → caller backs off

The four together: under degraded conditions the system loses some requests cleanly, with the user seeing an honest 503, instead of cascading into a brown-out where every layer queues, every retry amplifies, every wasted request crowds out real work, and recovery takes tens of minutes.

Real-world references

System	What they did	Lesson
Netflix Hystrix → resilience4j	Circuit breaker + thread-pool isolation + fallback. The per-dependency thread pool is a back-pressure mechanism.	Isolation per dependency; fail-fast over fail-late.
Google's "Tail at Scale" paper	Hedging at scale; tied requests with cancellation; backup-but-cancel pattern.	P99 reduction 30–50% with < 5% extra QPS.
AWS retry library	Token bucket per (region, service); decorrelated jitter; SDK-level enforcement.	Standardise retry across the SDK; don't let services reinvent it.
gRPC deadlines	Deadlines propagate through metadata automatically; cancellation built-in.	Use a runtime that does this for you. Manual deadline plumbing is error-prone.
Stripe API retries	Idempotency-Key header at the public API; behind it, a retry-safe execution log.	Idempotency must be designed into the API surface, not bolted on later.
Envoy / Linkerd	Service-mesh-level retry budgets and deadlines, applied uniformly.	Push these primitives into the mesh; application code handles the happy path.

The interview answer that lands

In a system-design round, the question "what happens if the downstream is slow" is the cue. The candidate who passes ties the four together:

Back-pressure first. "The downstream's bounded queue and 503 with Retry-After tell us early."
Retries with budget. "Caller retries with full-jitter backoff, capped at 10% of recent success rate."
Hedging on the tail. "Hedge requests to a second replica fire at P95 of the latency distribution; budget caps hedge rate at 5%."
Deadlines wrap everything. "Per-request deadline propagates down the call chain. Mid-work checks abort when deadline is missed; queue drains drop expired work."

And the closer: "These four are why the system degrades cleanly instead of cascading into brown-out." That one sentence is what interviewers score against and what production teams actually rely on.

Back-pressure, retries, hedging, deadlines

Why one isn't enough

1 · Back-pressure

The mechanism, in three flavours

Diagram — back-pressure in action

The signal must propagate

2 · Retries — and the budget that keeps them safe

Exponential backoff with jitter — three flavours

3 · Hedging

The mechanic

When hedging earns its keep

4 · Deadlines & deadline propagation

Why a per-hop timeout isn't enough

Deadline as wall-clock, not duration

Deadline-aware shedding

The four together — a worked example

Real-world references

The interview answer that lands

Further reading

Back to distributed systems topics