A queue between two services looks like architecture and feels like complexity. Half the time it earns its keep; the other half it just delays the problem by hiding it. The decision rule is simple — and it has almost nothing to do with throughput.
Engineers reach for queues for the wrong reasons constantly. "It scales": sure, but so does a thread pool. "It decouples services": only if you also fix the failure semantics, which most teams don't. "We need async processing" — fine, but a goroutine pool already gives you that. The actual reason to introduce a queue is narrower than people think, and the cost is higher.
The three reasons that earn a queue
- You need durability across producer death.
- Producer accepts a request, ack's the user, then crashes. If the work must still get done, you need durable storage between accept and process — and a queue is the cleanest abstraction for that storage. Without this requirement, an in-process worker pool is simpler.
- Producer rate is bursty and consumer rate must be steady.
- Classic case: a marketing campaign that drops 100k webhook invocations in five seconds, with downstream services that can process 1k/sec. The queue absorbs the burst; the consumer drains at its own pace. This is what "smoothing" actually buys you — never above the consumer's sustained capacity, always able to handle the burst's peak.
- Multiple consumers need to process the same event independently.
- Fan-out — one user-signup event feeds welcome-email, fraud-check, analytics-ingest, and CRM-sync, each owned by a different team. Without a queue (or pub-sub), the producer ends up knowing all the consumers. The queue inverts that dependency.
The cost a queue adds
Every queue you introduce makes the system harder in four ways. Be honest about which ones bite for your case.
- Delivery semantics.
- At-most-once, at-least-once, exactly-once. The middle one is the only realistic default and it means every consumer must be idempotent. Idempotency keys, dedup windows, the outbox pattern — none of these are free.
- Ordering.
- Most queues give you per-partition or per-key ordering, not global. If the downstream needs strict ordering (think: account-balance updates), you have to design the partitioning around that requirement, and a hot key now serialises an entire shard.
- Backpressure visibility.
- Without a queue, slow consumers slow producers immediately and you notice. With a queue, slow consumers just grow the queue depth. You need monitoring on queue depth, age of the oldest message, and dead-letter rate, plus alerting that fires before the queue runs out of disk.
- Operational surface.
- Another piece of infrastructure to provision, monitor, upgrade, secure, and pay for. A Kafka cluster is roughly the same operational weight as a Postgres cluster. RabbitMQ is lighter but still a thing. SQS is easiest but vendor-locked.
The decision flowchart, in prose
Ask, in order:
- Does the producer need to survive crashing mid-handoff? If yes → queue. If no → continue.
- Is the producer rate bursty by more than 5×, with a consumer that can't scale to match the burst peak? If yes → queue (or autoscaler, see below). If no → continue.
- Will more than one independent service need to react to the same event? If yes → pub-sub (queue with fan-out). If no → continue.
- You don't need a queue. Use a goroutine pool, a worker thread, or a synchronous call. The simpler thing wins.
The four alternatives that often win instead
- Thread or worker pool.
- If you just want "process this off the request thread", a bounded worker pool inside the same process is simpler. Loss-on-crash is the trade — fine for analytics, fatal for payments.
- Autoscaling consumer.
- If the burst problem is "downstream can't keep up", the right answer might be to scale the downstream instead. HPA on a stateless service can react in 60–120 seconds; if your burst is longer than that, autoscaling wins. Below that, you need the queue.
- The database itself, as queue.
- An
eventstable with astatuscolumn and a worker that polls it does the same job as Kafka for small scale. Postgres can serve 5k events/sec as a queue without breaking sweat. Hand-off to a real queue when sustained rate exceeds ~10k/sec or queue depth exceeds 1M. - The outbox pattern.
- Write the event to a database table in the same transaction as the state change; a separate poller copies it to the queue. Solves the dual-write problem (write to DB + queue atomically). Even if you decide you need a queue, the outbox is usually the right way to publish to it.
If you do introduce one — pick the right shape
- Kafka.
- Durable log, per-partition ordering, consumer groups, replay from offset. Right for high-throughput event streams (10k+/sec sustained), audit logs, CDC, anything where you want to add new consumers later that replay history. Operational weight: high.
- RabbitMQ.
- Traditional broker. Per-message acks, flexible routing (topic exchanges), dead-letter handling. Right for work queues with priorities, RPC-over-broker, complex routing. Throughput tops out lower than Kafka. Operational weight: medium.
- SQS / Pub-Sub / Service Bus.
- Managed. Higher per-message cost, no ops, simple semantics. Right for cloud-native systems where the volume doesn't justify running your own broker. Operational weight: ~zero.
- NATS / Redis Streams.
- Lighter weight. NATS for in-cluster microservice messaging with optional persistence. Redis Streams when you already have Redis and don't want another service. Right when you want a queue but don't want Kafka-scale operations.
What a defensible "yes, queue" looks like
Same six-line rule as capacity planning:
- Reason. "Producer-crash durability and analytics fan-out across three teams."
- Volume. "30k events/sec peak, 5k average, retention 7 days."
- Pick. "Kafka, 3-broker cluster, 12 partitions per topic, replication factor 3."
- Delivery semantics. "At-least-once. Consumers idempotent via event_id table with 7-day retention."
- Ordering. "Per-user-id partition key. Strict order within user, no global order needed."
- Monitoring. "Queue depth, consumer lag, oldest message age, DLQ rate. Page at 1M depth or lag > 5 min."
Common mistakes
- Adding a queue to "decouple" without changing the failure semantics.
- If service A still throws an error when service B is down, you have not actually decoupled anything — you've added a hop. Decoupling requires that A continues to function when B is unavailable, which requires queue persistence plus A being okay with eventual consistency.
- Treating the queue as a database.
- "I'll just keep events in Kafka forever." Kafka is not a database. Use it as a transport, with a real database as the system of record. The exception is event-sourcing, which is its own architecture and requires careful design.
- Choosing exactly-once.
- It's a marketing term. Pick at-least-once and make consumers idempotent. Every "exactly-once" implementation in production is at-least-once plus dedup at the consumer.
- One topic, no partitioning.
- A single-partition Kafka topic processes ~10k messages/sec at the consumer. If you need more, partition. Pick the partition key carefully — once data is partitioned by user_id, changing to order_id requires migrating every consumer.
- Forgetting the DLQ.
- Poison messages — malformed payloads, unrecoverable errors — will accumulate and either block the queue or be silently dropped. A dead-letter queue catches them and triggers an alert. Build it from day one.
What to read next
- Idempotence at scale · learn path
- The deduplication patterns that make at-least-once delivery safe.
- Back-pressure, retries, hedging · learn path
- The four reliability primitives — queues solve back-pressure but require the others too.
- Async architecture · primer
- The broader pattern this chapter is the decision-rule for.
- Kafka, as a river · guide
- If Kafka is the answer, this is the implementation walk-through.