When to introduce a queue — System Design Handbook

A queue between two services looks like architecture and feels like complexity. Half the time it earns its keep; the other half it just delays the problem by hiding it. The decision rule is simple — and it has almost nothing to do with throughput.

Engineers reach for queues for the wrong reasons constantly. "It scales": sure, but so does a thread pool. "It decouples services": only if you also fix the failure semantics, which most teams don't. "We need async processing" — fine, but a goroutine pool already gives you that. The actual reason to introduce a queue is narrower than people think, and the cost is higher.

The three reasons that earn a queue

You need durability across producer death.: Producer accepts a request, ack's the user, then crashes. If the work must still get done, you need durable storage between accept and process — and a queue is the cleanest abstraction for that storage. Without this requirement, an in-process worker pool is simpler.
Producer rate is bursty and consumer rate must be steady.: Classic case: a marketing campaign that drops 100k webhook invocations in five seconds, with downstream services that can process 1k/sec. The queue absorbs the burst; the consumer drains at its own pace. This is what "smoothing" actually buys you — never above the consumer's sustained capacity, always able to handle the burst's peak.
Multiple consumers need to process the same event independently.: Fan-out — one user-signup event feeds welcome-email, fraud-check, analytics-ingest, and CRM-sync, each owned by a different team. Without a queue (or pub-sub), the producer ends up knowing all the consumers. The queue inverts that dependency.

The cost a queue adds

Every queue you introduce makes the system harder in four ways. Be honest about which ones bite for your case.

Delivery semantics.: At-most-once, at-least-once, exactly-once. The middle one is the only realistic default and it means every consumer must be idempotent. Idempotency keys, dedup windows, the outbox pattern — none of these are free.
Ordering.: Most queues give you per-partition or per-key ordering, not global. If the downstream needs strict ordering (think: account-balance updates), you have to design the partitioning around that requirement, and a hot key now serialises an entire shard.
Backpressure visibility.: Without a queue, slow consumers slow producers immediately and you notice. With a queue, slow consumers just grow the queue depth. You need monitoring on queue depth, age of the oldest message, and dead-letter rate, plus alerting that fires before the queue runs out of disk.
Operational surface.: Another piece of infrastructure to provision, monitor, upgrade, secure, and pay for. A Kafka cluster is roughly the same operational weight as a Postgres cluster. RabbitMQ is lighter but still a thing. SQS is easiest but vendor-locked.

The decision flowchart, in prose

Ask, in order:

Does the producer need to survive crashing mid-handoff? If yes → queue. If no → continue.
Is the producer rate bursty by more than 5×, with a consumer that can't scale to match the burst peak? If yes → queue (or autoscaler, see below). If no → continue.
Will more than one independent service need to react to the same event? If yes → pub-sub (queue with fan-out). If no → continue.
You don't need a queue. Use a goroutine pool, a worker thread, or a synchronous call. The simpler thing wins.

The four alternatives that often win instead

Thread or worker pool.: If you just want "process this off the request thread", a bounded worker pool inside the same process is simpler. Loss-on-crash is the trade — fine for analytics, fatal for payments.
Autoscaling consumer.: If the burst problem is "downstream can't keep up", the right answer might be to scale the downstream instead. HPA on a stateless service can react in 60–120 seconds; if your burst is longer than that, autoscaling wins. Below that, you need the queue.
The database itself, as queue.: An events table with a status column and a worker that polls it does the same job as Kafka for small scale. Postgres can serve 5k events/sec as a queue without breaking sweat. Hand-off to a real queue when sustained rate exceeds ~10k/sec or queue depth exceeds 1M.
The outbox pattern.: Write the event to a database table in the same transaction as the state change; a separate poller copies it to the queue. Solves the dual-write problem (write to DB + queue atomically). Even if you decide you need a queue, the outbox is usually the right way to publish to it.

If you do introduce one — pick the right shape

Kafka.: Durable log, per-partition ordering, consumer groups, replay from offset. Right for high-throughput event streams (10k+/sec sustained), audit logs, CDC, anything where you want to add new consumers later that replay history. Operational weight: high.
RabbitMQ.: Traditional broker. Per-message acks, flexible routing (topic exchanges), dead-letter handling. Right for work queues with priorities, RPC-over-broker, complex routing. Throughput tops out lower than Kafka. Operational weight: medium.
SQS / Pub-Sub / Service Bus.: Managed. Higher per-message cost, no ops, simple semantics. Right for cloud-native systems where the volume doesn't justify running your own broker. Operational weight: ~zero.
NATS / Redis Streams.: Lighter weight. NATS for in-cluster microservice messaging with optional persistence. Redis Streams when you already have Redis and don't want another service. Right when you want a queue but don't want Kafka-scale operations.

What a defensible "yes, queue" looks like

Same six-line rule as capacity planning:

Reason. "Producer-crash durability and analytics fan-out across three teams."
Volume. "30k events/sec peak, 5k average, retention 7 days."
Pick. "Kafka, 3-broker cluster, 12 partitions per topic, replication factor 3."
Delivery semantics. "At-least-once. Consumers idempotent via event_id table with 7-day retention."
Ordering. "Per-user-id partition key. Strict order within user, no global order needed."
Monitoring. "Queue depth, consumer lag, oldest message age, DLQ rate. Page at 1M depth or lag > 5 min."

Common mistakes

Adding a queue to "decouple" without changing the failure semantics.: If service A still throws an error when service B is down, you have not actually decoupled anything — you've added a hop. Decoupling requires that A continues to function when B is unavailable, which requires queue persistence plus A being okay with eventual consistency.
Treating the queue as a database.: "I'll just keep events in Kafka forever." Kafka is not a database. Use it as a transport, with a real database as the system of record. The exception is event-sourcing, which is its own architecture and requires careful design.
Choosing exactly-once.: It's a marketing term. Pick at-least-once and make consumers idempotent. Every "exactly-once" implementation in production is at-least-once plus dedup at the consumer.
One topic, no partitioning.: A single-partition Kafka topic processes ~10k messages/sec at the consumer. If you need more, partition. Pick the partition key carefully — once data is partitioned by user_id, changing to order_id requires migrating every consumer.
Forgetting the DLQ.: Poison messages — malformed payloads, unrecoverable errors — will accumulate and either block the queue or be silently dropped. A dead-letter queue catches them and triggers an alert. Build it from day one.

When to introduce a queue.

The three reasons that earn a queue

The cost a queue adds

The decision flowchart, in prose

The four alternatives that often win instead

If you do introduce one — pick the right shape

What a defensible "yes, queue" looks like

Common mistakes

What to read next