11 min read · Guide · Distributed systems
How it works · Distributed systems

How message queues let services talk without waiting.

A producer pushes. A broker holds. A consumer pulls. Three boxes; thirty thousand teams have built thirty thousand systems on top of them.

Parts01 – 11 InteractiveMode picker PrereqAsync messaging

What is a message queue?

Decouple in time, decouple in space.

A message queue is a broker that holds messages between producers and consumers, decoupling them in time and space. Producers can keep producing while consumers are slow; consumers can come and go without producers caring. The four major products — RabbitMQ, Apache Kafka, AWS SQS, NATS — make different choices about ordering, persistence, and delivery semantics.

A message queue inserts a buffer between two services. The producer no longer cares whether the consumer is up; the consumer no longer cares whether the producer is going faster than it can keep up with. The broker holds the difference.

That single trick gives you four properties: temporal decoupling (consumer can be down), throughput smoothing (spikes get absorbed), retry semantics (broker holds until ack), and topology decoupling (services don't need each other's addresses, that's service discovery's job). The trade is one more thing to operate, with its own failure modes.


Queue topologies: point-to-point, fan-out, work queue

One queue, three shapes.

Most queue patterns are one of three shapes — sometimes the broker calls them queues, sometimes topics, sometimes channels, but the semantics are these three.

Mode 01 · Point-to-point

One producer, one queue, one consumer.

A classic command queue. The producer enqueues a job; one of the consumers (if multiple) pulls it. Each message is delivered to exactly one worker. Used for jobs, async commands, anything where the work should not run twice.


Delivery semantics: at-most-once, at-least-once, exactly-once

How hard the broker tries to deliver each message.

At-most-once: the producer fires and forgets, or the broker delivers without retries. Possible to lose a message; never delivers twice. UDP-shaped — fine for telemetry, fatal for payments.

At-least-once: the broker holds until the consumer acks. Crashes mid-processing get redelivered. The consumer must be idempotent — same message twice should be a no-op the second time, often using a distributed ID as the dedup key. The default for most production systems.

Exactly-once: at-least-once plus a deduplication mechanism. Kafka does this with idempotent producers + transactional commits; SQS FIFO with content-hash dedup; databases with primary-key constraints. End-to-end exactly-once across multiple systems is a research-grade problem; usually "at-least-once + idempotent consumer" is the right shape.


Acknowledgement and visibility: the broker holds until you say done

A message stays in-flight until the consumer confirms it.

The consumer reads a message and the broker stores it as in-flight. The consumer then either acks (delete it), nacks (return to the queue immediately), or times out (return after visibility timeout). The visibility timeout is the maximum the consumer is allowed to take before the broker assumes it died and redelivers.

Pick the timeout to be just longer than your slowest reasonable handler. Too short and you get duplicates; too long and crashed consumers leave their work invisible for minutes. Long-poll, periodic renewLock calls (Service Bus), or SQS's ChangeMessageVisibility let you extend during processing.

# RabbitMQ basic ack/nack
channel.basicConsume("orders", false, (delivery) -> {
 try {
 process(delivery.getBody());
 channel.basicAck(delivery.getDeliveryTag(), false);
 } catch (RetryableException e) {
 channel.basicNack(delivery.getDeliveryTag(), false, true); // requeue
 } catch (PoisonException e) {
 channel.basicNack(delivery.getDeliveryTag(), false, false); // → DLQ
 }
});

Dead-letter queues: where poison messages go

Poison messages, parked aside.

A message that fails repeatedly is a poison message. Without a DLQ, it cycles forever and starves the rest of the queue. With one, after N retries the broker moves it to a separate queue for inspection. The main queue keeps moving; the operator can dig later.

A common smell: a DLQ that grows monotonically. Either the consumer has a bug, or the producer is sending malformed data. Page on DLQ growth rate, not just on the main queue's depth.


Designing a consumer for at-least-once delivery

The dedup key, the poison message, and the queue you hope stays empty.

At-least-once is the guarantee most production systems actually run, and it pushes the hard work onto the consumer. Take a concrete one: a worker that captures payments from a payment.authorized queue. The broker promises every message arrives at least once, which means the worker will, sooner or later, see the same authorization twice — a redelivery after a crash, a visibility timeout that fired while a slow card-network call was still in flight, a producer retry. Design for the duplicate first.

The dedup key is a producer-assigned ID that names the operation, not the message — here the authorization ID, never the broker's delivery tag, which changes on every redelivery. The consumer inserts it into a processed_messages table in the same transaction as the capture itself. Check-then-process without that shared transaction reintroduces the race: crash between the check and the write and the duplicate sails through. With the transactional insert, a redelivered message hits a primary-key conflict, the worker reads the conflict as "already done", acks, and moves on.

Now the poison message. One authorization arrives with a malformed amount field; the handler throws, the nack requeues it, and it comes straight back. On a plain queue it burns one worker's time on every lap. On a key-partitioned queue it is worse: per-key ordering means the consumer cannot step past it, so every message behind that customer's bad one waits. A stalled partition looks exactly like a consumer outage, for one key only.

The exit is a retry cap plus a dead-letter queue: after, say, five attempts, move the message aside with its attempt count and last exception attached, ack the original, let the partition drain. Then watch the DLQ itself. Depth above zero sustained for more than a few minutes is the page; growth rate tells you whether you have one bad message or a bad deploy. A DLQ nobody alerts on is a data-loss bin with extra steps.

One honest caveat, because this guide — like most — slides between two systems that share vocabulary but not mechanics. AMQP-style brokers (RabbitMQ, SQS, Service Bus) track per-message state: ack deletes, nack redelivers, the DLQ is a broker feature, and a poison message can be moved aside without touching its neighbours. Kafka-style logs have none of that machinery: the consumer holds an offset, "ack" means committing it, the broker never redelivers one message in isolation, and a DLQ is something you build yourself by producing the failure to a dead-letter topic and committing past it. The ack-and-visibility sections above describe the first family; the stalled partition is what poison looks like in the second.


Why FIFO ordering is hard at scale

FIFO is hard, at scale.

A single-partition, single-consumer queue trivially preserves order. Add parallelism and order is gone — multiple workers process in parallel, the broker hands them out by availability, not by enqueue time.

The trick most systems use: partition by key. SQS FIFO uses a "message group ID"; Kafka uses partition keys; RabbitMQ uses consistent-hash exchanges. Same key → same partition → ordered. Different keys → may interleave. You preserve per-entity order while keeping aggregate parallelism.


Message queue products: RabbitMQ, Kafka, SQS, NATS

Four houses, very different.

The popular brokers each represent a design point. Picking one is mostly about which trade-offs you want.

RabbitMQ

A smart broker that routes messages

Exchanges (direct, topic, fanout, headers) route to queues. Per-message ack. Strong routing primitives. Tens of thousands msg/s comfortably.

SQS

Fully managed, dead simple

AWS native. Standard (best-effort) or FIFO (300/s per group). Zero operations; no routing — just queues.

NATS

Lightweight publish/subscribe

Sub-millisecond fan-out. JetStream adds persistence. Tiny binary. Great for IoT, control planes, microservice eventing.

And then there is Kafka — technically a partitioned commit log, not a queue, but used as one constantly. Distinct enough that it has its own page. The mental model differs: messages are never deleted on consumption, consumers track offsets, replay is trivial. If you are actively weighing it against a broker, the Kafka vs RabbitMQ comparison walks the trade-offs dimension by dimension.


Message queue gotchas: three things that go wrong

Queue lag, hot partitions, and stuck consumers.

Queue lag: consumer falls behind producer. Either scale consumers (with autoscaling tied to depth), or accept that processing is async by definition. Always graph queue depth — flat is healthy, growing is shedding work into the future.

Hot partition: a key dominates traffic, one consumer is pinned to it. Re-partition by a more uniform key, or unique-shard the hot key.

Stuck consumers: a consumer holds a message past visibility timeout but doesn't actually crash. Other consumers see redelivery. Fix the slow path; or extend visibility on long jobs explicitly.


Message queue throughput: Kafka, RabbitMQ, SQS, NATS, Pulsar

The numbers that pick the broker.

Apache Kafka
~1 million msgs/sec per broker on commodity hardware (LinkedIn original benchmark, 2014). Modern Confluent Cloud claims 10× that with KRaft and tiered storage. Latency: ~5-10 ms p99 within a region. Best fit: high-throughput event streams, replayable logs.
RabbitMQ
~50k msgs/sec per node with persistence and ack; ~1 million msgs/sec without persistence. Latency: ~1-5 ms p99 in-region. Best fit: complex routing (exchanges, bindings), per-message acks, low-latency RPC patterns.
AWS SQS
Standard queue: effectively unlimited throughput, at-least-once, no ordering. FIFO queue: ~3,000 msgs/sec/group, exactly-once with dedup. Latency: ~10-30 ms p99 (HTTP-based, not gRPC). Best fit: serverless workflows, fully managed, no operations.
NATS / NATS JetStream
Core NATS: ~10 million msgs/sec per node (in-memory, fire-and-forget). JetStream (persistent): ~1 million msgs/sec. Sub-millisecond latency in-region. Best fit: latency-critical IoT, microservice messaging, real-time signalling.
Apache Pulsar
Throughput similar to Kafka (~1M msgs/sec), with a different storage model — separate broker and BookKeeper layers, geo-replication built in, multi-tenancy first-class. Best fit: large multi-tenant deployments, geo-replicated streaming.
Redis Streams
~100k msgs/sec per Redis instance, with Redis-grade reliability. Best fit: small-scale streaming inside an existing Redis stack; not a Kafka replacement.

The right pick rarely turns on raw throughput — almost any of these handles the load most teams have. Pick by operational model: managed (SQS) vs self-hosted; replay (Kafka, Pulsar) vs ack-and-discard (RabbitMQ); strict ordering (Kafka per partition, FIFO SQS) vs free-form (Standard SQS, RabbitMQ default).


Message queues in production: three case studies

What real shops route through queues.

LinkedIn — Kafka, the canonical case. Kafka was born at LinkedIn (2010-2011) to handle their activity-stream and operational metrics. By 2024 they run ~7 trillion messages per day across thousands of clusters. The original published throughput target: ~50 MB/sec/broker on commodity hardware; current production easily 10× that. Their engineering blog is the canonical reference.

Stripe — RabbitMQ for delivery, SQS for batch. Stripe routes API webhooks through RabbitMQ (per-customer ordering, configurable retry policy with exponential backoff, dead-letter topology). Background batch jobs (invoicing, reports) run through SQS for the simpler at-least-once semantics. Public engineering posts (2017, 2022) document both pipelines.

Slack — Kafka for fan-out, in-memory for hot paths. Slack's "Flannel" connection multiplexer reads from Kafka topics that fan out workspace events. ~10 million concurrent WebSocket connections per region; the Kafka topic-per-workspace model lets the connection layer scale independently of the event-bus throughput. Slack's KubeCon and re:Invent talks document the architecture.

The pattern across all three: queues separate the producer's RPS profile from the consumer's. The right operational property — replay (Kafka), strict per-key order (Kafka partitions, FIFO SQS), simple at-least-once (SQS Standard), routing flexibility (RabbitMQ) — is the choice that matters more than the broker brand.



A closing note

Message queues let your system go slow without going down. The headline questions are always the same — what's your delivery guarantee, what's your ordering guarantee, and what's your DLQ policy — and a senior engineer can probably answer them faster than they could finish typing the broker's name.

Found this useful?