Handbook · Vol. IV · 2026 Track IV · Distributed mechanics · piece 1 of 4 Deep dive

Track IV · Distributed mechanics

Async architecture.

Events, queues, idempotence, at-least-once delivery — the four message-delivery semantics, what each costs the producer, and what each demands of the consumer.

Track IV · Distributed mechanics
Async work, retries, and failures.
  1. Deep dive
    Async architecture
  2. Deep dive
    Orchestration & resiliency
  3. Primer
    Network protocols & DNS
  4. Decision rule
    When to introduce a queue

Some work doesn't have to happen now. The act of admitting that — and putting a queue between the request and the work — is the single biggest reliability upgrade most systems will ever get.

So far, every system we've designed follows a synchronous request → response pattern: the user waits while the server does work. This pattern breaks at scale. Sending a welcome email, encoding a video, generating a report, fanning out a notification to a million followers — none of these need to finish before the user gets their answer. This module covers the asynchronous patterns that decouple "the user pressed a button" from "the work that the button kicked off." Done well, async architecture turns brittle systems resilient and slow systems fast.

PRODUCER · QUEUE · CONSUMER producer writes & returns 202 queue / log durable · ordered (per partition) · retained consumer A consumer B consumer C producer is fast (write & ack) · consumers scale independently · queue absorbs spikes retries, ordering, exactly-once, dead letters — all the hard problems live in the queue
Producer writes, gets acknowledged immediately. Consumers process at their own pace. The queue absorbs traffic spikes and outlives crashes.

The four reasons to go async

Asynchronous architecture is a tool. Like any tool, it solves specific problems. The four reasons it earns its complexity:

Latency
Returning 202 Accepted in 50 ms is better than returning 200 OK in 5 seconds. Users don't care that the email hasn't been sent; they care that the page didn't hang.
Decoupling
The producer and consumer don't have to be online at the same time. The consumer can be redeployed, scaled, or even rewritten in a different language, and the producer never notices.
Smoothing
Bursty traffic gets absorbed by the queue. A flash sale that produces 50,000 orders in five seconds doesn't crash the order pipeline; it gets buffered and worked off over minutes.
Fan-out
One event triggers many actions. A new signup writes one row to the queue and is consumed by the email service, the analytics service, the recommendation service, and the audit log — independently, without the producer knowing they exist.

The patterns

Async architecture comes in four canonical patterns. Each maps to a different shape of work and a different failure model.

Task queue

One producer writes a job, one consumer pulls it, runs it once, acks. Workhorse for "do this now-ish but not synchronously." SQS, Sidekiq, Celery.

Pub/sub

One producer publishes, N subscribers each get a copy. Used for fan-out: notifications, cache invalidation, audit. Redis pub/sub, SNS, NATS.

Event log

Append-only, durable, replayable. Multiple independent consumers track their own offset. Foundation for event-sourced systems and stream processing. Kafka, Kinesis, Pulsar.

Workflow

Long-lived state machines: orchestrated steps with retries, timers, branches. The pattern when "send email, wait 24h, if no reply send reminder, then mark stale" is one logical operation. Temporal, AWS Step Functions.

Delivery semantics — the trade-off no one explains clearly

Every queue offers one of three guarantees, and a system that doesn't pick one explicitly has the worst of all three. The choice cascades through the rest of the design.

At-most-onceAt-least-onceExactly-once
What can happenLost messagesDuplicate messagesNeither (in theory)
HowProducer doesn't retry; consumer ack is fire-and-forgetProducer retries on uncertainty; consumer must be idempotentProducer dedupes by id; consumer is transactional
CostCheap; lossyCheap; idempotency burden moves to the consumerExpensive; coordination overhead
When to useMetrics, telemetry, sampling~99% of business workflowsMoney. Only money.

The honest version of "exactly-once" is "at-least-once delivery + idempotent processing." Even Kafka's exactly-once semantics work by giving each message an idempotency key inside a single Kafka cluster — once the data leaves the cluster, you're back to designing for at-least-once. Plan accordingly: every consumer should be idempotent. Every write should carry an idempotency key the system can dedupe on.

The moving parts of a production queue

Visibility / lease timeout
When a consumer pulls a message, the queue hides it for N seconds. If the consumer doesn't ack within N, the message becomes visible again. Set N to 2-3× the P99 processing time, not the average.
Retries with backoff
Failed messages should be re-tried with exponential backoff (1s → 2s → 4s → 8s) and jitter. Without backoff, a transient downstream failure becomes a thundering herd.
Dead-letter queue
After N retries the message goes to a DLQ for inspection. Critical: the DLQ must have alerting. An overflowing DLQ is the silent killer of async systems.
Ordering
Most queues offer per-partition or per-key ordering, not global. Choose a partition key (user_id, account_id) so related messages stay ordered without limiting parallelism.
Backpressure
What happens when consumers can't keep up? Queue depth grows. The producer either blocks (good — applies pressure upstream), drops (acceptable for telemetry), or paginates (good for batch).
Poison pill handling
One message that always crashes the consumer is a poison pill. It must move to the DLQ after a bounded number of retries; otherwise it stops processing for everyone.

The hard cases

The queue isn't free reliability. Adding a queue trades synchronous failure ("your request failed") for asynchronous failure ("the work silently never happened"). Without alerting on queue depth, DLQ count, and consumer lag, you have moved errors out of the user's view rather than fixed them. Wire up the dashboards before the queue.
Idempotency is not optional. At-least-once delivery means you will get duplicates. The consumer must be safe to call twice with the same input. The cleanest way: every message carries an idempotency key; the consumer records "I've processed key X" in a database before doing the work. Without this, retries cause double-charges, double-emails, double-everything.
Don't queue everything. Async adds operational complexity: a queue, a worker pool, a DLQ, retry semantics, monitoring, deployment coordination. If the work is <100 ms and not externally observable, do it inline. Reach for async when the work is slow, externally-facing, fan-out, or has its own SLO.

Choosing the right tool

NeedPick
Background jobs in a single appSidekiq, Celery, BullMQ, RQ — language-native, fast to ship.
Fan-out across servicesSNS + SQS, NATS, Redis Pub/Sub.
Event sourcing, replay, stream processingKafka, Pulsar, Kinesis.
Multi-step workflows with timersTemporal, AWS Step Functions, Cadence.
Cron-like scheduled workEventBridge Scheduler, Cloud Scheduler, or just Kubernetes CronJobs.
RPC-style with deferred responseRabbitMQ direct-reply, gRPC streaming, or just a polled job-id endpoint.

Practical defaults

  1. Default to at-least-once delivery and idempotent consumers. Idempotency keys travel with every message.
  2. Visibility timeout = 2-3× P99 processing time, with a hard maximum of 12 hours.
  3. Retries: 5 attempts with exponential backoff (1s, 2s, 4s, 8s, 16s) + ±20% jitter, then to DLQ.
  4. Alert on DLQ count > 0, queue depth growth rate, and consumer lag — these are the three signals that tell you async is failing silently.
  5. Pick a partition key that reflects the natural ordering boundary (user_id, account_id) — global ordering is rarely needed and always expensive.
  6. Replay matters. Pick a queue/log that lets you replay from a point in time when you'll want to backfill or recover.
Found this useful?