Some work doesn't have to happen now. The act of admitting that — and putting a queue between the request and the work — is the single biggest reliability upgrade most systems will ever get.
So far, every system we've designed follows a synchronous request → response pattern: the user waits while the server does work. This pattern breaks at scale. Sending a welcome email, encoding a video, generating a report, fanning out a notification to a million followers — none of these need to finish before the user gets their answer. This module covers the asynchronous patterns that decouple "the user pressed a button" from "the work that the button kicked off." Done well, async architecture turns brittle systems resilient and slow systems fast.
The four reasons to go async
Asynchronous architecture is a tool. Like any tool, it solves specific problems. The four reasons it earns its complexity:
- Latency
- Returning
202 Acceptedin 50 ms is better than returning200 OKin 5 seconds. Users don't care that the email hasn't been sent; they care that the page didn't hang. - Decoupling
- The producer and consumer don't have to be online at the same time. The consumer can be redeployed, scaled, or even rewritten in a different language, and the producer never notices.
- Smoothing
- Bursty traffic gets absorbed by the queue. A flash sale that produces 50,000 orders in five seconds doesn't crash the order pipeline; it gets buffered and worked off over minutes.
- Fan-out
- One event triggers many actions. A new signup writes one row to the queue and is consumed by the email service, the analytics service, the recommendation service, and the audit log — independently, without the producer knowing they exist.
The patterns
Async architecture comes in four canonical patterns. Each maps to a different shape of work and a different failure model.
One producer writes a job, one consumer pulls it, runs it once, acks. Workhorse for "do this now-ish but not synchronously." SQS, Sidekiq, Celery.
One producer publishes, N subscribers each get a copy. Used for fan-out: notifications, cache invalidation, audit. Redis pub/sub, SNS, NATS.
Append-only, durable, replayable. Multiple independent consumers track their own offset. Foundation for event-sourced systems and stream processing. Kafka, Kinesis, Pulsar.
Long-lived state machines: orchestrated steps with retries, timers, branches. The pattern when "send email, wait 24h, if no reply send reminder, then mark stale" is one logical operation. Temporal, AWS Step Functions.
Delivery semantics — the trade-off no one explains clearly
Every queue offers one of three guarantees, and a system that doesn't pick one explicitly has the worst of all three. The choice cascades through the rest of the design.
| At-most-once | At-least-once | Exactly-once | |
|---|---|---|---|
| What can happen | Lost messages | Duplicate messages | Neither (in theory) |
| How | Producer doesn't retry; consumer ack is fire-and-forget | Producer retries on uncertainty; consumer must be idempotent | Producer dedupes by id; consumer is transactional |
| Cost | Cheap; lossy | Cheap; idempotency burden moves to the consumer | Expensive; coordination overhead |
| When to use | Metrics, telemetry, sampling | ~99% of business workflows | Money. Only money. |
The honest version of "exactly-once" is "at-least-once delivery + idempotent processing." Even Kafka's exactly-once semantics work by giving each message an idempotency key inside a single Kafka cluster — once the data leaves the cluster, you're back to designing for at-least-once. Plan accordingly: every consumer should be idempotent. Every write should carry an idempotency key the system can dedupe on.
The moving parts of a production queue
- Visibility / lease timeout
- When a consumer pulls a message, the queue hides it for N seconds. If the consumer doesn't ack within N, the message becomes visible again. Set N to 2-3× the P99 processing time, not the average.
- Retries with backoff
- Failed messages should be re-tried with exponential backoff (1s → 2s → 4s → 8s) and jitter. Without backoff, a transient downstream failure becomes a thundering herd.
- Dead-letter queue
- After N retries the message goes to a DLQ for inspection. Critical: the DLQ must have alerting. An overflowing DLQ is the silent killer of async systems.
- Ordering
- Most queues offer per-partition or per-key ordering, not global. Choose a partition key (user_id, account_id) so related messages stay ordered without limiting parallelism.
- Backpressure
- What happens when consumers can't keep up? Queue depth grows. The producer either blocks (good — applies pressure upstream), drops (acceptable for telemetry), or paginates (good for batch).
- Poison pill handling
- One message that always crashes the consumer is a poison pill. It must move to the DLQ after a bounded number of retries; otherwise it stops processing for everyone.
The hard cases
Choosing the right tool
| Need | Pick |
|---|---|
| Background jobs in a single app | Sidekiq, Celery, BullMQ, RQ — language-native, fast to ship. |
| Fan-out across services | SNS + SQS, NATS, Redis Pub/Sub. |
| Event sourcing, replay, stream processing | Kafka, Pulsar, Kinesis. |
| Multi-step workflows with timers | Temporal, AWS Step Functions, Cadence. |
| Cron-like scheduled work | EventBridge Scheduler, Cloud Scheduler, or just Kubernetes CronJobs. |
| RPC-style with deferred response | RabbitMQ direct-reply, gRPC streaming, or just a polled job-id endpoint. |
Practical defaults
- Default to at-least-once delivery and idempotent consumers. Idempotency keys travel with every message.
- Visibility timeout = 2-3× P99 processing time, with a hard maximum of 12 hours.
- Retries: 5 attempts with exponential backoff (1s, 2s, 4s, 8s, 16s) + ±20% jitter, then to DLQ.
- Alert on DLQ count > 0, queue depth growth rate, and consumer lag — these are the three signals that tell you async is failing silently.
- Pick a partition key that reflects the natural ordering boundary (user_id, account_id) — global ordering is rarely needed and always expensive.
- Replay matters. Pick a queue/log that lets you replay from a point in time when you'll want to backfill or recover.