8 min read · Guide · Network · Operations
How it works · Network · Operations

How an API gateway handles every request before your services do.

A single front door for a fleet of backend services. The gateway handles TLS, authentication, rate limits, routing, header rewriting, and logging in one place, so each service can stay focused on its own job.

Parts01–08 InteractivePipeline visualizer PrereqHTTP / OAuth / load balancing

What an API gateway is

One front door for many services.

The pattern came out of Netflix around 2012. They had roughly a hundred backend services and a handful of client types — TVs, phones, set-top boxes, web — each wanting slightly different responses. Rather than ask each service team to add device-specific filtering, retry handling, and authentication checks, they built Zuul: a Java process at the edge that handled those cross-cutting concerns once. Service teams could go back to working on their own service.

That's the whole idea. An API gateway sits between clients and a fleet of backend services. It handles the work each service would otherwise have to repeat — authentication, rate limiting, routing, header rewriting, observability — and forwards the request to whichever service should answer. One edge, one place to enforce policy, one log line per request regardless of which service ended up handling it.

Without one, each backend tends to implement the same handful of concerns slightly differently. Rate-limit logic drifts between teams. CORS headers get set inconsistently. Auth checks evolve at different paces. A gateway consolidates this once. Today the pattern ships as Kong, AWS API Gateway, Apigee, Tyk, and Envoy. They differ in operational shape; the job is the same.


The eight stages a request passes through, in order

And where each one can stop a request short.

A useful way to think about the gateway is as a sequence of small policies, each able to short-circuit the request. A bad token doesn't reach the backend. A client that's exceeded its quota doesn't reach the database. A request to an unknown route returns a clean 404 instead of a confusing 502 from somewhere downstream.

Pick a scenario below — the success path, or one of four common failure modes — and the diagram shows where the request stops.

Client Browser
01
TLS termination
Decrypt the request; pass plaintext on (back-channel TLS upstream).
✓ pass
02
Authentication
JWT / OAuth / API key — verify the caller's identity.
✓ pass
03
Authorization
Does this caller have permission for this route?
✓ pass
04
Rate limit
Per-key, per-route, per-tier. Return 429 if exceeded.
✓ pass
05
Route
Match path + method to a backend; pick a healthy instance.
✓ pass
06
Mutate request
Strip auth header, add internal X- headers, transform path.
✓ pass
07
Forward upstream
Open connection to the backend; stream the request.
✓ pass
08
Mutate response
Strip internal headers, add CORS, log, return.
✓ pass
Upstream backend service
Final response
200 OK

Caller authenticated, authorized, under rate limit, route matched, upstream healthy. The boring case.


Authentication: verifying who is calling

Three mechanisms, usually settled at the edge.

Identity is one of the things most teams choose to verify at the gateway. The alternative — each service running its own auth check — works, but means more places to keep in sync when the auth provider changes, and more places to get it subtly wrong. Centralising it gives you a single point of policy and a single log line per request.

Once TLS is terminated, the three common mechanisms are:

  • API keys — a long random string in an Authorization or X-API-Key header. Simple and opaque. Fine for server-to-server, less suitable for end-user credentials because there's no built-in notion of expiry or scope.
  • OAuth bearer tokens — typically signed JWTs from an authorization server, carrying the user's identity and scopes inside the token. The usual default for systems with end users.
  • mTLS — the client presents a certificate, which the gateway verifies against a CA. Higher assurance, narrower use cases, more certificate management to handle.

JWTs in particular work well at gateway scale because verification is local. The gateway pulls the issuer's public key from a JWKS endpoint once and caches it, then verifies each token's signature against the cached key. No per-request round trip to the auth provider, which keeps added latency at the gateway in the low single-digit milliseconds rather than tens.

One thing worth checking when setting this up: key rotation behaviour. When the issuer rotates keys, the gateway will see a JWT signed with a key it hasn't cached yet. The expected behaviour is to re-fetch JWKS when an unknown kid appears and retry the verification once. Most JWT libraries support this, but it's not always on by default.


Rate limiting: stopping overload at the edge

Refuse the call before it reaches your services.

A misbehaving client can create production load that has nothing to do with real usage. A buggy retry loop, a cron firing every second instead of every minute, an SDK that batches more aggressively after an update. Rate limiting at the gateway turns those into 429s before they reach a service or a database.

The mechanism is straightforward. The gateway keeps a counter per API key (or per IP, per route, or some combination) and increments it on each request. When the counter crosses the limit, the gateway returns 429 Too Many Requests with a Retry-After header. The counter is usually token bucket (which allows short bursts) or sliding window (smoother but more expensive). Counter state usually lives in Redis so every gateway node shares the same view.

Picking the actual number is harder than implementing the mechanism. 100 requests per minute is often too tight for SDKs that batch background syncs. 10,000 per minute is usually loose enough that a single buggy integration can still hurt. Most teams converge on per-tier limits with a generous burst and a tighter sustained rate — for instance, 1,000 requests per minute sustained with a 100/sec burst for a paid customer, lower for free. The numbers are workload-specific; instrument first, then tune.

One thing worth flagging: a too-tight limit can sometimes make load problems worse, because clients with poor retry logic respond to 429s by retrying harder. Setting Retry-After with a sensible backoff hint, and logging every limit hit, helps tell apart real attacks from customers who need a higher quota. The rate-limiter simulator walks through the algorithm trade-offs; the retry-strategy visualiser shows the feedback loop.


Routing: picking the service, then the instance

Path resolves to a service; the load balancer picks a host.

Once auth and rate limiting are done, the gateway has to decide where the request goes. Routing rules match on path, method, headers, or query — in priority order — and resolve to a backend service. The gateway then asks the load balancer for a healthy instance of that service. The load-balancing guide covers the algorithms; from the gateway's perspective the question is just "give me a host I can send this to."

A typical routing table looks something like this:

GET    /api/v1/users/*       → users-svc            (priority 100)
GET    /api/v2/users/*       → users-svc-v2         (priority 100)
*      /api/admin/*          → admin-svc            (priority 50, requires admin role)
*      /api/payments/*       → payments-svc         (priority 100, mTLS required)
*      /api/*                → unknown-route 404    (priority 0, catch-all)

The routing config tends to be the most frequently edited part of the gateway. New service rolls out, gets a route. Endpoint moves, route gets edited. Service deprecated, route gets a sunset header. Because a routing mistake can send traffic to the wrong service, most teams treat routing changes the way they treat code: version-controlled, reviewed, and deployed via CI/CD. A smoke test that hits each top-level prefix after a config push catches most mistakes before they reach customers.

Some gateways make this easier than others. Envoy with xDS and Kong with declarative config are designed around it; older AWS API Gateway setups can require more manual work.


Rewriting the request and response headers

The internal request isn't quite the public one.

The request that arrives at your service usually isn't identical to the one the client sent. The gateway rewrites it on the way in and rewrites the response on the way out. The goal is to keep the public API consistent and the backend simpler.

Inbound, the gateway typically strips the public Authorization header (the backend doesn't need to see the JWT — the gateway has already verified it), reads claims from the token, and adds internal headers the backend cares about: X-User-Id, X-Tenant-Id, X-Auth-Scopes, X-Trace-Id. Paths can be rewritten too — POST /api/v1/users becomes POST /users by the time it reaches the service. The version prefix exists for the public; the service doesn't need to know about it.

Outbound, the gateway does the reverse. Internal headers (anything the service shouldn't expose) get stripped before the response reaches the client. CORS headers get added once, consistently, instead of each service setting them independently. Error formats can be normalised so a 500 from one service looks like a 500 from another. Sensitive fields can be redacted as a defence in depth.

The pattern is essentially that the backend speaks an internal dialect and the public speaks HTTP, with the gateway translating between them. It's worth keeping mutation rules narrow; complex per-route transformations tend to drift over time and become hard to reason about.


Observability: one log line per request

The gateway sees what the public sees.

The gateway is a natural place for telemetry. Every request flows through it, including the ones that fail before reaching a service. If you want one place to ask "what does my API actually look like in production?", the gateway is usually it. Common signals worth capturing: request volume per route, p50/p95/p99 latency, error rates split between 4xx and 5xx, auth failures, rate-limit hits, geographic distribution.

Distributed tracing pairs well with this. The gateway is a good place to inject a traceparent (W3C Trace Context) header that propagates through every downstream service. End-to-end traces help locate the slow service in a request chain that crosses multiple teams.

One thing worth being careful with: metric cardinality. Each unique combination of metric labels becomes its own time series, and the cost in Prometheus or Datadog grows with the number of series. A useful rule of thumb is to label by route template rather than URL parameters, by status code class (2xx, 4xx, 5xx) unless individual codes are needed, and to keep user IDs in logs rather than metric labels.


What does not belong at the gateway

Some things are better left in the services.

Because the gateway is convenient, there's usually pressure to push more concerns into it. Auth and rate limits work well there, so why not validation, transformations, or simple business logic? The argument against is that the gateway sits on the request path of every API call. Anything added there runs for every request, and adds risk to the one piece of infrastructure that every team depends on.

A circuit breaker in front of each upstream is a reasonable thing to put at the gateway — it keeps the gateway responsive when a downstream service degrades. Three rough rules of thumb for everything else:

cross-cutting

Put it in the gateway

Auth, rate limits, observability, routing, header rewriting — concerns that every service would otherwise reimplement. Centralising these is usually a clear win.

business logic

Keep it in the service

Anything that depends on application state generally belongs in the service. A gateway that needs to know about billing cycles or domain models tends to become harder to change over time.

heavy compute

Keep it in the service

Image transformation, ML inference, PDF generation. Too expensive to do on the request path of every API call. Better suited to dedicated services or async jobs.

Gateways in production

Five common options and where each fits.

Kong · open-source + enterprise
Built on NGINX/OpenResty with Lua plugins. Used by Cargill, Honeywell, Yahoo Japan. Large plugin ecosystem, and writing custom Lua plugins is relatively approachable. Fits well when you want a programmable, self-hosted gateway.
AWS API Gateway
Fully managed. Priced per request plus data transfer — can get expensive at high volume, but there's no infrastructure to operate. Integrates tightly with Lambda, IAM, and CloudWatch. Added latency is typically 10–30 ms. Often the natural choice for serverless or AWS-heavy stacks.
Google Apigee
Enterprise-focused. Includes API monetisation, a developer portal, and quota management aimed at API-as-a-product use cases. Used by AT&T, Walgreens, Equifax.
Envoy (as gateway)
The data plane behind Istio and AWS App Mesh. xDS allows routes and policies to be pushed dynamically without restarts. A good fit when the gateway sits in front of an Envoy-based service mesh, since the same control plane and config language can be used for both. Used at Lyft (where it originated), Stripe, Square.
Tyk
Go-based, open-source, intentionally simpler than Kong. Used by Capital One and Domino's. Reasonable choice for smaller teams that want a programmable gateway without the OpenResty/Lua footprint.

For reference: a single Kong or Envoy node usually handles around 50,000 RPS at 1–2 ms p99 added latency on commodity hardware. Stripe's engineering write-ups describe their Envoy gateway tier handling roughly 300,000 RPS across the fleet, with custom rate-limiting and per-account routing pushed via xDS. Most systems run well below these numbers; the takeaway is that the gateway itself is usually not where the latency budget gets consumed, unless something expensive has been added to the path.


When a gateway is the wrong tool

Three situations where another approach usually fits better.

Service-to-service traffic inside the same trust zone. When service A calls service B and both live in the same VPC, an API gateway in the middle adds a hop and another piece of infrastructure to keep healthy. A service mesh — Istio, Linkerd, or Cilium with eBPF — handles cross-cutting concerns by putting them in a sidecar or kernel hook next to each service. The rough division: gateways for north–south traffic (external clients into your services), meshes for east–west (service to service).

Small systems with a single consumer. A two-service system serving an internal admin tool typically doesn't need a gateway. A plain reverse proxy that terminates TLS is enough — picking one is a much smaller decision, and Traefik vs Nginx covers the usual two candidates. The value of a gateway comes from multi-tenant routing, policy enforcement, and consolidation across many services — none of which are present at this scale.

Real-time bidirectional protocols. WebSockets, gRPC streaming, and WebRTC don't always fit the request/response model some gateways assume. Envoy and NGINX (with the right configuration) handle them; others can degrade in subtle ways. If streaming or long-lived connections matter, it's worth testing with the actual protocol under load before committing to a particular gateway.



A closing note

An API gateway is a small pipeline of unglamorous policies applied to every request that crosses the perimeter — TLS, auth, rate, route, mutate, forward, shape, log. When the stages are kept narrow and well-bounded, the gateway tends to fade into the background and the services on the other side stay simpler. When too much is pushed into the gateway it becomes a piece of infrastructure that every team has opinions about, which is usually a signal that some of those concerns belong elsewhere.

Related Monolith limits API Gateway
Found this useful?