What is the difference between L4 and L7 load balancing?

L4 (transport layer) load balancers route TCP/UDP packets — they see source/dest IP and port and pick a backend per connection. L7 (application layer) load balancers terminate the connection, parse HTTP, and pick a backend per request, possibly by URL or header. L4 is faster and protocol-agnostic; L7 is smarter and HTTP-aware. Most modern stacks have both layered.

New to this? · ELI5 · 1 min Read Load balancers explained simply, in plain English

11 min read · Guide · Network · Operations

How it works · Network · Operations

How load balancing spreads traffic across many servers.

Spreading traffic across a pool sounds like the easy part of distributed systems. It is not. The algorithm you pick decides who falls over first, who serves slow users, and whether "add another box" actually helps.

Parts01 – 08 Interactive4 algorithms PrereqTCP / HTTP

What is load balancing?

One IP, many backends.

Load balancing distributes incoming requests across a pool of backend servers. Layer 4 (TCP/UDP) load balancers operate on connections; Layer 7 (HTTP) load balancers operate on requests. The five canonical algorithms — round-robin, least-connections, P2C, weighted, consistent-hashing — make different trade-offs around fairness, tail latency, and behaviour under heterogeneous backends.

A load balancer is the address the world talks to. Behind it, a fluctuating pool of machines actually does the work. The LB's job sounds trivial — pick a backend for each request — but it carries three responsibilities that together hold most distributed systems together.

Distribution

Spread N requests across M backends so no single one falls behind. The algorithm in Part 03 is what makes this work — or fail invisibly.

Health

Probe backends; remove the dead from rotation; add them back when they recover. The whole point of multiple boxes is to survive one of them dying — but only if the LB notices fast enough.

Insulation

Hide the pool from the client. The world sees one stable IP. You can deploy, scale, and replace backends behind the LB without anyone reconnecting. This is what makes rolling deploys possible.

L4 vs L7 load balancing

Routing by connection versus routing by request.

The first decision is which layer of the stack the LB looks at. Both have a place; both have things they cannot do.

L4 · transport

Move bytes, fast.

The LB sees TCP connections (or UDP datagrams). It picks a backend at SYN time and forwards every byte after. It cannot route based on URL or header — it never parsed them. AWS NLB, HAProxy in tcp mode, kube-proxy iptables. Lowest latency, lowest CPU, can fall back to direct-server-return for blistering throughput.

L7 · application

Understand the request.

The LB terminates the connection itself, parses HTTP, then forwards. It can route /api/* → backend pool A and /static/* → backend pool B; rewrite headers; stream gRPC; speak HTTP/2 to clients and HTTP/1.1 upstream. Envoy, NGINX, ALB. Costs the price of parsing every request, twice. If you run the L7 box yourself, see Traefik vs Nginx for the trade-offs.

Most production stacks layer both

An L4 LB at the edge for raw throughput and DDoS handling, then an L7 LB inside the cluster for routing intelligence. Two boxes, two scopes, one stable IP for the user.

The four load balancing algorithms, side by side

Four algorithms, five backends, one slow one.

Below: a five-backend pool. Backend C is slow (60 ms / request). Switch the algorithm and watch in-flight load balance — or fail to. Click a backend's health switch to simulate a failure mid-stream.

RATE 8/s

Strict cycle through the pool. Every backend gets exactly 1/N of the requests, regardless of how loaded any of them is. Cheap, predictable, and quietly disastrous when one backend slows down.

backend A 20 ms

in-flight: 0

backend B 22 ms

in-flight: 0

backend C 60 ms

in-flight: 0

backend D 18 ms

in-flight: 0

backend E 25 ms

in-flight: 0

0 requests routed Round robin

Health checks are harder than they look

Knowing a backend is up is not the same as knowing it works.

The LB needs to know which backends are accepting traffic. A naïve TCP connect every few seconds catches "the process is gone" but misses "the process is up but its database driver is in a deadlock and every request hangs at 30 s." Production health checks are layered.

L4
TCP connect

Cheap, frequent (every few seconds), catches process death and port issues. Tells you nothing about whether the application is actually serving requests correctly.
L7
HTTP /healthz

The application returns 200 OK only when its own self-checks pass — DB reachable, caches warm, internal queues drained. Best practice: two endpoints — /livez for "the process is alive at all" and /readyz for "ready to accept new work."
passive
Outlier ejection

Watch real traffic. If a backend produces five 5xx in a row, eject it immediately — don't wait for the next health probe. Envoy's outlier detection, AWS Target Group unhealthy-host detection. Catches the partial failures L7 probes miss.
slow start
Ramp, don't dump

When a fresh backend joins the pool, do not send it 1/N of all traffic at once — its caches are cold, its connections are unwarmed. Start it at 5% and ramp linearly over thirty seconds. NGINX slow_start; Envoy slow-start mode.

Session affinity and stickiness

Sending a user back to the same backend every time.

Stateless backends are easy to load-balance: any backend can serve any request. WebSockets, server-sent events, in-memory sessions, and shard-locked queries are not stateless — they need the next request from the same user to land on the same backend or the connection breaks.

Affinity is implemented either by hashing a stable identifier (source IP, a cookie the LB sets, a header from the application) or by consistent-hashing a request key. The cost is sharp: when a backend dies, its sessions are lost. Affinity must be paired with graceful drain, sticky-cookie expiry, and a connection-aware deploy strategy.

Source IP

Cheap and broken.

The LB hashes the client IP. Works until users sit behind a corporate NAT — then thousands of users share one IP and pile onto one backend. Or until your mobile user moves from Wi-Fi to LTE and changes IP mid-session.

Cookie / header

Application-aware.

The LB sets a cookie on the first response (SRV=A) and routes future requests with that cookie back to backend A. Survives NAT, survives IP change, expires when you say it does. The default for any L7 LB you'd deploy in 2026.

Pool churn and graceful drain

Adding and removing backends without dropping live requests.

Backends come and go: scale-out adds new ones, deploys rotate every one in turn, crashes remove them without warning. The LB has to handle each case without dropping live requests.

A normal shutdown follows the drain sequence: the orchestrator marks the backend not ready; the LB stops sending new requests; in-flight requests are allowed to complete (typically 30–60 s); then the process exits. A SIGTERM with a long enough terminationGracePeriod is what makes rolling deploys non-disruptive.

Crashes don't follow this pattern. The first signal the LB has is failed requests. With outlier ejection (Part 04) and a fast retry budget at the client, the broken backend is ejected within a second or two and traffic shifts to the rest of the pool. See the autoscaling guide for the metric loop that decides when to add and remove backends in the first place.

Global server load balancing (GSLB): balancing across regions

Routing each user to the nearest healthy datacenter.

A single L7 LB scales to hundreds of thousands of requests per second, but it lives in one region. Global Server Load Balancing (GSLB) is what routes a user in Tokyo to a datacenter in Tokyo and a user in London to one in Frankfurt — without their browser ever knowing.

The two common implementations: DNS-based — the authoritative DNS server returns the IP of the nearest healthy region (with sub-minute TTL so failover is fast) — and anycast — multiple regions announce the same IP, and BGP routes packets to the topologically nearest one. Cloudflare and Google use anycast for everything; AWS Route 53 latency-routing is DNS-based GSLB.

For deeper context, see the DNS guide on how authoritative servers work and the BGP guide on what anycast actually means.

Where load balancers fail in surprising ways

The failures that surprise you.

The LB is the single point through which every request flows. When it misbehaves, every request misbehaves. Three classes of failure are worth knowing by name.

01 · THUNDERING HERD

Thundering herd after a restart

An LB rolls — for thirty seconds, all traffic flows through one half of the pool, which has cold caches and warming pools. Latency spikes; clients retry; each retry hits the same overloaded half. Slow-start (Part 04) and client jitter break the cycle.

02 · UNEVEN HASHING

Uneven hashing behind a NAT

Source-IP hashing behind a single corporate NAT gateway can produce a pool where 95% of users hash to one backend. The LB is doing exactly what it was told; the inputs were the problem. Use cookie or header hashing instead.

03 · RETRY AMPLIFICATION

Retries multiply at every layer

Client retries × LB retries × inner-service retries = an outage's failure rate multiplied to disaster. Set a retry budget at the LB ("≤ 10% of requests may retry"); return 503 with Retry-After when overload is detected. See the retry-strategy simulator.

Cloud load balancers compared: AWS ALB/NLB, GCP, Azure

What the managed services actually offer.

AWS ALB (Application Load Balancer): L7 HTTP/HTTPS. WebSocket and HTTP/2 supported, gRPC supported (since 2020). ~$0.0225/hour + LCU charges. Routes by host header, path, query string, or HTTP header. Native integration with WAF, Cognito, OIDC.
AWS NLB (Network Load Balancer): L4 TCP/UDP/TLS. Static IP per AZ, supports millions of connections per second, ultra-low latency. Source IP preserved by default. Best fit: gaming servers, IoT, real-time bidding.
GCP Global Load Balancer: Anycast IPv4 in front of multi-region backend. Single global IP routed to the nearest healthy region. Probably the simplest cross-region LB story among the big three.
Azure Application Gateway / Front Door: Application Gateway is regional L7 with WAF; Front Door is global anycast (similar to GCP GLB or Cloudflare). Often deployed together: Front Door at the edge, App Gateway per region.
Cloudflare Load Balancing: DNS-based + anycast L4/L7. Health-check-aware traffic shifting between regions or providers. Often used for multi-cloud or hybrid backends.

The session-affinity gotcha. All cloud LBs offer sticky sessions via cookie or source-IP hashing, but the trade-offs are subtle. Cookie-based stickiness (the typical default) breaks if the client clears cookies; source-IP stickiness breaks behind proxies and NAT. Many deployments end up with subtly broken stickiness because the underlying assumption (each user has one consistent IP) doesn't hold on mobile.

A closing note

Load balancing is one of those topics that sounds dull until something melts. The algorithm picks themselves are well-understood — round-robin, least-conn, P2C, hash — but the production failure modes are subtle, and the consequences cascade. Pick the algorithm to match your backends' shape; pair it with health checks deep enough to catch real failure; pair it with drains long enough to let in-flight requests finish. Most of the rest is the LB doing the boring, important thing it always does.

Read
further.

Found this useful?

How load balancing spreads traffic across many servers.

What is load balancing?

Distribution

Health

Insulation

L4 vs L7 load balancing

Move bytes, fast.

Understand the request.

The four load balancing algorithms, side by side

Health checks are harder than they look

TCP connect

HTTP /healthz

Outlier ejection

Ramp, don't dump

Session affinity and stickiness

Cheap and broken.

Application-aware.

Pool churn and graceful drain

Global server load balancing (GSLB): balancing across regions

Where load balancers fail in surprising ways

Thundering herd after a restart

Uneven hashing behind a NAT

Retries multiply at every layer

Cloud load balancers compared: AWS ALB/NLB, GCP, Azure

Readfurther.

Read
further.