Load balancing.
Three load balancers, all named "Elastic Load Balancer" — Application (L7, HTTP), Network (L4, TCP/UDP/TLS), and Gateway (transparent appliance steering). The choice is almost mechanical once you know the protocol you want to balance. The interesting parts are target groups, listener rules, the cross-zone toggle, and the WAF and TLS termination that ride on top.
1 · What a load balancer actually is (and isn't)
The mental model that survives every conversation: a load balancer is a reverse proxy whose job is to spread the population of incoming connections (or requests) across a population of backends, decide which backends are healthy, and hide the details from clients. The two big design choices — the layer it operates at, and the unit it balances — determine almost everything else about its behaviour.
L4 vs L7 is the most important split. An L4 load balancer (NLB, HAProxy in TCP mode, kube-proxy) sees a TCP/UDP 5-tuple — src IP, src port, dst IP, dst port, protocol — and forwards the bytes without looking at what's inside. Each connection gets balanced to a backend; once routed, that connection stays pinned for its lifetime. An L7 load balancer (ALB, nginx in HTTP mode, Envoy) terminates the TCP connection, parses the HTTP request (or gRPC frame), and forwards each request independently. The same client connection can have request 1 sent to backend A and request 2 sent to backend B.
What a load balancer isn't: a router. It sits in the data path; if it goes down, the service goes down. The "Elastic" in ELB is the property that AWS scales the LB capacity for you (more nodes, more bandwidth) — but those nodes still sit in your AZs, attached to ENIs in your subnets. A regional Route 53 outage can leave the LBs up; an AZ outage takes its LB nodes with it. The LBs are infrastructure, not magic.
And the third surface that confuses people: DNS-based load balancing (Route 53 weighted/latency/failover) isn't really load balancing in the same sense — it picks which endpoint to dial, then the client opens a fresh TCP connection to that endpoint. It has no visibility into ongoing traffic, can't shed load mid-flight, and is bounded by DNS TTL on every change. Treat DNS-based "load balancing" as traffic steering at the resolution layer; ALB/NLB do the actual balancing at the connection/request layer.
| Load balancers are good for | Load balancers are bad for |
|---|---|
| Hiding pool membership from clients (scale up, scale down, no client config changes) | Replacing a session-affinity database — sticky sessions are fragile |
| Terminating TLS in one place (rotate certs centrally; ACM-free for AWS resources) | State-heavy protocols where the LB has no visibility (long-poll, raw TCP with custom framing) |
| Health-driven traffic shedding (deregister sick backends within seconds) | Per-tenant isolation in a multi-tenant pool (use separate target groups or LBs) |
| L7 routing — path, host, header — without each backend learning the rules | Cross-region failover (use Route 53; LBs are zonal/regional) |
| Sitting in front of an autoscaling group whose membership flexes | "Just give me a stable IP for an outbound API" — that's an EIP / NAT, not an LB |
2 · Three load balancers
| ALB | NLB | Gateway LB | CLB (legacy) | |
|---|---|---|---|---|
| Layer | L7 (HTTP/HTTPS, gRPC, WebSockets) | L4 (TCP, UDP, TLS) | L3/L4 (transparent IP) | L4 + basic L7 |
| Routing | Host, path, header, method, query, source IP | By port + protocol only | By 5-tuple to inline appliance | Port-based; HTTP listener with limited rules |
| Source IP to target | Yes (X-Forwarded-For; PROXY protocol opt-in) | Yes (preserves source IP for IP targets) | Yes — transparent | X-Forwarded-For only |
| TLS termination | Yes | Yes | No | Yes |
| Sticky sessions | Yes — cookie-based | Yes — source-IP affinity | n/a | Cookie + duration |
| WAF integration | Native | No (Shield only) | No (appliance handles its own) | No |
| Static IPs | No (DNS only) | Yes — one EIP per AZ | No | No |
| Latency overhead | ~5-10 ms added | ~100 µs added | ~depends on appliance | ~5 ms |
| Bill | Per-hour + LCUs | Per-hour + NLCUs (cheaper per request) | Per-hour + per-GB | Per-hour + per-GB (worst per-request) |
| Reach for it when | Anything HTTP-shaped | Non-HTTP, latency-sensitive, or need static IPs | Inserting a firewall/IDS | Don't — migrate to ALB or NLB |
3 · How an ALB request actually flows
An ALB is built as one regional service made of many AZ-local LB nodes. Each AZ where the ALB is enabled has one or more ENIs (LB nodes) in your subnets; DNS returns a different IP per AZ. The client's DNS-resolution step is what spreads load across AZs; what happens inside an AZ is what spreads load across targets.
An ELB is built of three things stacked vertically:
- Listener — protocol + port the LB listens on.
HTTPS:443,HTTP:80(probably redirects to 443),TCP:6379. - Rules (ALB only) — ordered match/action: "if path is
/api/*, forward totg-api." Catch-all default rule at the end. - Target groups — the pool of backends, each registered as either an instance ID, an IP, a Lambda function, or another ALB (in nested-LB setups). Target group has the health check definition.
Target groups are independent objects, reusable across listeners. A blue/green deploy works by swapping which target group the listener rule points at — instant cutover with no LB recreation. Weighted target groups let you put traffic to N target groups by percentage — the canary pattern.
/health probe. Align them.4 · NLB at L4 — flow hashing and source IP preservation
An NLB doesn't terminate TCP — it forwards. The way it picks a target is by hashing the 5-tuple (source IP, source port, dest IP, dest port, protocol) to a target slot in the target group. A given client connection always lands on the same target for its lifetime (this is why NLB is the L4 default for stateful protocols). Cross-AZ traffic is off by default, so the hash is computed against only the targets in the same AZ as the LB node that received the SYN.
Because the NLB doesn't terminate the connection, the target sees the actual client IP as the TCP source address — no X-Forwarded-For required, no PROXY protocol needed. This is the biggest functional difference from ALB. For protocols where the application needs to know the real client IP (logging, IP-based rate limiting, geolocation), NLB delivers it for free; ALB makes you parse X-Forwarded-For and convince yourself the upstream client didn't set it.
One subtlety: source-IP preservation only works for IP-type targets in the same VPC. For instance-type targets in the same VPC, source IP is preserved. For cross-VPC IP targets, or targets reached via PrivateLink, the NLB SNATs to its own IP and source-preservation is lost. Test before you assume.
5 · TLS, certificates, SNI
Both ALB and NLB can terminate TLS using certificates from AWS Certificate Manager (ACM). ACM certificates are free for public certs issued to AWS resources — pay nothing for the cert itself. The catch: ACM certs can only be used with AWS services (ALB, NLB, CloudFront, API Gateway, etc.). For terminating TLS on an EC2 you'd run yourself, you still need a cert from Let's Encrypt or your CA.
- One listener, many certs (SNI). ALB and NLB support multiple certs per listener. The client's SNI extension picks which one to present. Useful for multi-tenant SaaS where each customer brings their own domain.
- HTTP → HTTPS redirect. Standard listener rule:
HTTP:80with a single rule "redirect tohttps://#{host}#{path}?#{query}." Two clicks. - End-to-end TLS. Terminate at the ALB, then re-encrypt to targets if you need defence-in-depth. Slight latency cost; usually worth it for compliance regimes. ALB always terminates TLS — there is no "pass-through" mode. If you need TLS pass-through, use NLB with a TLS listener in passthrough or TCP listener mode.
- TLS on NLB. Supported, but the NLB terminating TLS gives up some of the L4 latency advantage. Many teams pass TLS through NLB and terminate at the application.
6 · Cross-zone load balancing — the silent toggle
By default, an ALB distributes traffic across AZs but each AZ-attached ENI of the ALB only forwards to targets in that same AZ unless cross-zone load balancing is on. NLBs ship with cross-zone off by default; ALBs ship with cross-zone on.
What this means in practice: if your 8 targets are split 6-1-1 across AZs and you have cross-zone off on NLB, the AZ-a instances handle ~33% of all traffic each, the AZ-b instance handles 33%, and AZ-c handles 33% — wildly uneven per-instance load. Turn cross-zone on and the load redistributes evenly across all 8.
7 · Deregistration delay — the graceful drain
When a target is removed from a target group (deploy, scale-in), the LB stops sending new connections but lets existing ones finish for the deregistration delay (default 300 s). Lower this for short-lived requests; raise it for long-lived ones (WebSockets, gRPC streams) — but cap at 3600 s.
During a rolling deploy this is the difference between "users see 504s" and "users notice nothing." ECS, EKS, and Auto Scaling all wait for deregistration before terminating the instance. The most common production bug is forgetting that your application has to handle SIGTERM and finish in-flight requests within the deregistration window — if the app exits at SIGTERM instead of draining, the LB happily keeps sending in-flight requests to a backend that's already gone, and clients see resets.
8 · Real-world case studies
Three public stories give a sense of how the L4/L7 split plays out at scale.
Discord — many millions of WebSocket connections. Discord's "real-time" experience is, mechanically, a giant pool of persistent WebSocket connections. Their infrastructure posts describe a layered approach: NLBs at the edge (because WebSockets are L7 but stay open for hours, and they need predictable static IPs for the Discord clients to reconnect to), with consistent-hash session affinity in their own routing layer above the LBs. The lesson: at WebSocket scale, the LB is the connection pinner, not the request balancer. ALB would work, but NLB's per-connection latency and source-IP preservation are the right primitives.
Cloudflare — the L4-vs-L7 framework, and what each costs you. Cloudflare's engineering team has written extensively on the trade-offs between L4 and L7 load balancing — their blog includes posts on why their own edge runs L7 (TLS terminated, request-routed) but their internal fabric runs L4 (XDP-accelerated, line-rate). The framework they articulate, which translates directly to AWS: L4 is the right answer when you don't need to look inside the payload; the moment you do (routing on path, mutating headers, terminating TLS to inspect), the L7 hop is worth its overhead. NLB-vs-ALB is the AWS expression of the same trade-off.
Reddit — ALB with WAF and CloudFront in front. Reddit has published several infrastructure posts describing their move from EC2-with-HAProxy to ALBs behind CloudFront. The shape is the canonical consumer-internet shape: CloudFront caches and absorbs DDoS; ALB does L7 routing and integrates WAF; target groups behind that are ECS/EKS services. The interesting choice is that they don't use NLB at the edge — even for non-HTTP needs they generally find a way to put HTTP shape on it, because ALB's WAF, header-based routing, and integration with their service mesh outweigh NLB's lower-latency advantages for consumer traffic where humans tolerate tens of milliseconds.
The through-line: pick L4 when latency is the primary cost or you need static IPs; pick L7 when routing decisions, security inspection, or per-request observability are the primary cost. Most consumer-internet workloads end up at L7 because the cost of a few extra milliseconds is dwarfed by the cost of not being able to inspect requests.
9 · Build it yourself — ALB in front of two EC2 instances
- Reuse the VPC from the previous lab — or recreate the public subnets. ALB needs at least two public subnets in different AZs.
VPC_ID=<your VPC> PUB_A=<public subnet a> PUB_B=<public subnet b> - Launch two EC2 instances running a one-line web server.
USERDATA=$(cat <<'EOF' | base64 #!/bin/bash yum install -y python3 echo "Hello from $(hostname)" > /tmp/index.html cd /tmp && nohup python3 -m http.server 8080 & EOF ) AMI=$(aws ssm get-parameter --name /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64 --query 'Parameter.Value' --output text) SG=$(aws ec2 create-security-group --group-name lab-targets --description "ALB targets" --vpc-id $VPC_ID --query GroupId --output text) aws ec2 authorize-security-group-ingress --group-id $SG --protocol tcp --port 8080 --cidr 10.0.0.0/16 EC2_A=$(aws ec2 run-instances --image-id $AMI --instance-type t3.micro --subnet-id $PUB_A --security-group-ids $SG --user-data $USERDATA --query 'Instances[0].InstanceId' --output text) EC2_B=$(aws ec2 run-instances --image-id $AMI --instance-type t3.micro --subnet-id $PUB_B --security-group-ids $SG --user-data $USERDATA --query 'Instances[0].InstanceId' --output text) - Create a target group and register both instances.
TG=$(aws elbv2 create-target-group --name lab-tg --protocol HTTP --port 8080 \ --vpc-id $VPC_ID --health-check-path /index.html \ --query 'TargetGroups[0].TargetGroupArn' --output text) aws elbv2 register-targets --target-group-arn $TG \ --targets Id=$EC2_A Id=$EC2_B - Create the ALB and a listener.
ALB_SG=$(aws ec2 create-security-group --group-name lab-alb --description "ALB" --vpc-id $VPC_ID --query GroupId --output text) aws ec2 authorize-security-group-ingress --group-id $ALB_SG --protocol tcp --port 80 --cidr 0.0.0.0/0 aws ec2 authorize-security-group-ingress --group-id $SG --protocol tcp --port 8080 --source-group $ALB_SG ALB=$(aws elbv2 create-load-balancer --name lab-alb \ --subnets $PUB_A $PUB_B --security-groups $ALB_SG --type application \ --query 'LoadBalancers[0].LoadBalancerArn' --output text) aws elbv2 create-listener --load-balancer-arn $ALB --protocol HTTP --port 80 \ --default-actions Type=forward,TargetGroupArn=$TG - Wait for active, then hit it.
aws elbv2 wait load-balancer-available --load-balancer-arns $ALB DNS=$(aws elbv2 describe-load-balancers --load-balancer-arns $ALB --query 'LoadBalancers[0].DNSName' --output text) # Hammer it 20 times and watch the balancing for i in $(seq 1 20); do curl -s http://$DNS/index.html; done | sort | uniq -c - Tear it all down.
aws elbv2 delete-load-balancer --load-balancer-arn $ALB aws elbv2 delete-target-group --target-group-arn $TG aws ec2 terminate-instances --instance-ids $EC2_A $EC2_B aws ec2 wait instance-terminated --instance-ids $EC2_A $EC2_B aws ec2 delete-security-group --group-id $ALB_SG aws ec2 delete-security-group --group-id $SG
10 · What breaks
- Targets stuck "unhealthy." Check (a) security group on the target allows traffic from the LB's SG, (b) the health check path returns 200 (not 302 to the login page), (c) the app is bound to
0.0.0.0not127.0.0.1, (d) the timeout is long enough for cold-starting apps. Also: target group health check is separate from application or container-runtime health checks — they can disagree. - Cross-zone cost surprise. Enabling cross-zone on a high-throughput NLB starts billing cross-AZ data transfer ($0.01/GB each way) for traffic the LB now spreads across AZs. On TB/day workloads this is hundreds to thousands of dollars/month. Audit before flipping the toggle on production NLBs.
- 5XX after a deploy. Deregistration delay is too short — connections are being killed mid-request. Raise it; make sure your app handles SIGTERM. Default 300s is too high for short-lived HTTP requests (deploys feel slow) and too low for long-lived WebSockets (clients get cut).
- "My static IP keeps changing." ALBs don't have static IPs; you need an NLB if you need fixed IPs (for an allow-list on a third party). Or put an NLB in front of the ALB.
- 504 Gateway Timeout. ALB has a default idle-timeout of 60 s. Long-running requests (large file uploads, slow report queries) blow past it. Raise the idle-timeout or move the long work to async.
- ALB always terminates TLS. There is no pass-through mode. If you need end-to-end TLS without the LB seeing plaintext, use NLB with a TCP listener (or TLS listener with target-group protocol = TLS) and terminate inside the application.
- WAF rule blocking legit traffic. Default rule sets are aggressive. Always start in count mode and only switch to block after a couple of weeks of false-positive tuning.
- NLB source-IP preservation lost via PrivateLink. If you front a service with NLB and expose it via PrivateLink, the NLB SNATs to its own IP — the target sees the NLB, not the original client. Account for it in any IP-based logic.
- Listener rule limit (100 per ALB). Heavy multi-tenant SaaS hitting "one rule per customer host" hits this fast. Either consolidate (wildcard host with header-routed customer ID) or shard customers across multiple ALBs.
11 · Further reading
- ALB user guide. The canonical reference for rules, target types, and the WAF integration.
- NLB user guide. Includes the cross-zone defaults and the IP-preservation modes.
- Discord — handling millions of concurrent connections. NLB + custom routing layer for persistent WebSockets.
- Cloudflare blog — L4 vs L7. Several posts on when each layer is the right answer (XDP, Unimog, edge architectures).
- Reddit engineering blog. Infrastructure posts on CloudFront + ALB + WAF for consumer-scale traffic.
- How load balancing works. The algorithm side — round robin, least-conns, IP hash, EWMA.
- Cloud networking. Where ELBs fit in the VPC layer.
- Load balancer simulator. See the algorithms move in a browser.