ELI5 · Distributed systems

Service mesh.

Giving every service its own personal assistant that handles all the calls, so the app code can stay simple.

Once you have many services calling each other, every one of them needs the same boring-but-vital plumbing: retries, timeouts, encryption, load balancing, and metrics. Baking that into each service, in each language, is a lot of duplicated, error-prone work.

A service mesh pulls that plumbing out of your code and hands every service a personal assistant — a small proxy that sits beside it and handles all its incoming and outgoing calls. Your app just makes a plain request; the assistant takes care of the rest.

1
Retries, timeouts, encryption: five services in five languages means writing the same plumbing five times.
2
The sidecar intercepts everything in and out — the app never learns its calls are being chaperoned.
Just get me payments.

3
Your app just makes a plain call — "payments" — and the sidecar quietly catches it.
4
The sidecar adds the retries, timeouts, and encryption on the way out.
5
A Go service and a Python one now behave identically on the wire, because the proxies speak for both.
6
Change a timeout policy in the control plane and every sidecar obeys at once — nothing is redeployed.

Give every service a personal assistant that handles all its calls, so the app code stays simple.

The sidecar pattern

The mesh works by deploying a small proxy alongside each service instance — the "sidecar" — and quietly routing all of that service's traffic through it. Because the proxy, not your code, handles retries, timeouts, load balancing, mutual TLS encryption, and telemetry, you get consistent behaviour across every service regardless of what language it is written in. The application stays blissfully unaware it is even there.

Control plane and the cost

A central control plane configures all those sidecars at once, so you can change a timeout policy or turn on encryption everywhere without touching application code or redeploying it. That uniformity, plus rich, automatic observability of every call, is the big draw at scale. The price is real: a mesh is another complex distributed system to run, the extra proxy hop adds a little latency, and for a handful of services it is usually more machinery than the problem warrants.

The real version Service mesh simulator →

← All explainers