Cloud Codex / 03
Networking — VPC, subnets, NAT.
Cloud networking takes the longest to internalise of any topic in this Codex. Three months in, you'll still occasionally route a private subnet to an internet gateway and wonder why your packets are vanishing. The good news: the mental model is the same on every cloud. Once you understand VPC, subnets, route tables, NAT, and peering on AWS, the same words on GCP and Azure read fluently.
1 · The model
- VPC. Your private IP space inside the cloud. A CIDR block — usually
10.0.0.0/16for a small org. Nothing inside it is reachable from the internet by default. - Subnet. A slice of the VPC, pinned to one availability zone. Public subnets have a route to the internet gateway; private subnets don't. Most production workloads live in private subnets.
- Route table. Per-subnet rules that say "for destination X, send to Y." Y is usually the internet gateway, a NAT gateway, a peer, or a transit gateway.
- Internet gateway (IGW). The VPC's door to the public internet. Public subnets route
0.0.0.0/0to it. Without an IGW, the VPC is fully internal. - NAT gateway. Lets a private subnet talk out to the internet (for package updates, calling external APIs) without being reachable from the internet.
- Security groups. Stateful firewalls attached to network interfaces. Allow-only; the default-deny is implicit.
- Network ACLs. Stateless subnet-level firewalls. Almost nobody touches these any more — SGs are enough.
- Peering / Transit Gateway. Connect VPCs to each other, or VPCs to on-premise networks. TGW is the modern hub-and-spoke version; peering is point-to-point.
2 · The AWS canonical version
| Layer | AWS service | What it does |
|---|---|---|
| Private network | VPC | The /16 you own inside the cloud. |
| Internet ingress | Internet Gateway | One per VPC. Routable from 0.0.0.0/0. |
| Internet egress (private subnets) | NAT Gateway | Managed, HA, expensive. (Or NAT instance — DIY, cheap, fragile.) |
| Network firewalls | Security Groups + NACLs | SGs at the ENI level. NACLs at the subnet level. |
| VPC ↔ VPC | VPC Peering | Point-to-point. No transitive routing. |
| VPC ↔ many VPCs ↔ on-prem | Transit Gateway | Hub for any-to-any routing. Replaces meshes of peerings. |
| On-prem ↔ VPC | VPN / Direct Connect | VPN over the public internet vs dedicated fibre into AWS POP. |
| Private access to AWS services | VPC Endpoints (Gateway / Interface) | Reach S3, DynamoDB, etc. without going over the public internet. Cheaper and more secure. |
| DNS | Route 53 (public + private hosted zones) | Public for external, private for internal-only names. |
| L7 load balancing | ALB (Application Load Balancer) | Path-based routing, WebSocket, gRPC. |
| L4 load balancing | NLB (Network Load Balancer) | TCP/UDP, ultra-low latency, preserves source IP. |
3 · GCP and Azure equivalents
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Private network | VPC | VPC (global, not regional) | VNet (regional) |
| Subnet | Subnet (per AZ) | Subnet (per region) | Subnet (per region) |
| Internet ingress | Internet Gateway | (Built into the VPC; no separate IGW) | Public IP + UDR |
| NAT egress | NAT Gateway | Cloud NAT | NAT Gateway |
| Firewall | Security Groups + NACLs | VPC Firewall Rules (project-wide) | NSGs (Network Security Groups) |
| VPC ↔ VPC | VPC Peering / TGW | VPC Peering / Network Connectivity Center | VNet Peering / Virtual WAN |
| On-prem ↔ cloud | VPN / Direct Connect | Cloud VPN / Interconnect | VPN Gateway / ExpressRoute |
| L7 load balancing | ALB | Global HTTP(S) Load Balancer | Application Gateway / Front Door |
| L4 load balancing | NLB | TCP/UDP Network LB | Standard Load Balancer |
| DNS | Route 53 | Cloud DNS | Azure DNS / Private DNS |
GCP's VPCs are global by default. One VPC can span every region; subnets are regional. This is a real ergonomic win — no peering required between regions inside one VPC. AWS and Azure VPCs/VNets are regional, so multi-region usually means multiple VPCs plus TGW/Virtual-WAN to stitch them.
4 · The patterns you'll set up
- Public subnet for ALBs, private for app + DB. Internet-facing ALB sits in public subnets across multiple AZs; targets sit in private subnets. The pattern for ~every web stack.
- NAT for outbound from private subnets. Your app servers fetch package updates, call third-party APIs. One NAT per AZ (don't share across AZs — cross-AZ traffic costs money).
- VPC endpoints for S3 / DynamoDB. Calling S3 from a private subnet without an endpoint sends traffic over NAT — both expensive and slower. Endpoints keep it on the AWS backbone.
- Private hosted zones for internal DNS. Services talk to each other via friendly names, resolved only inside the VPC.
- Hub-and-spoke for multi-VPC. A central VPC with TGW; satellite VPCs (one per business unit, environment, or product) attach to it. Simpler than a full mesh of peerings, easier to govern.
5 · What breaks
- The VPC peering anti-pattern. Peering is non-transitive — if A↔B and B↔C, A still can't reach C through B. Once you have more than three VPCs, switch to TGW.
- NAT cost surprise. NAT Gateway is ~$32/month plus $0.045/GB. A chatty app processing 10 TB through NAT is paying $450/month just for NAT. VPC endpoints and direct-to-AWS-service traffic dodge most of it.
- Cross-AZ data charges. Within a region but across AZs, AWS charges per GB ($0.01/GB each way as of writing). A microservices mesh that doesn't pin to AZ-local replicas can rack up thousands a month.
- Security group references. SGs can allow traffic from another SG (not just CIDRs). Useful and the right pattern — but if you delete the source SG someone forgot was referenced, you get a quiet break that's hard to debug.
- DNS resolution inside the VPC. Two DNS resolvers, one for AWS-internal names and one for your private hosted zone, served from
VPC CIDR base + 2. Custom resolvers (for split-horizon or on-prem integration) are an art unto themselves. - MTU mismatch on VPN / Direct Connect. Standard VPC MTU is 9001 (jumbo frames inside, 1500 over IGW). VPN tunnels drop the effective MTU. Apps that don't honour Path-MTU-Discovery (looking at you, badly-configured proxies) hang weirdly.
6 · Cost note
Cloud networking has three line items that quietly stack up:
- NAT Gateway data processing. $0.045/GB. Audit it. The fix is VPC endpoints for AWS services and S3 Gateway endpoints (which are free).
- Cross-AZ traffic. $0.01/GB each direction. AZ-aware service discovery (or AZ-local replicas) is the durable fix.
- Egress to the internet. $0.05–$0.09/GB depending on volume and region. CDN in front of any external read traffic kills 80–95% of this.
The two-line summary: most cloud bills are eaten by data movement, not by compute. Audit the Cost Explorer's "Data Transfer" view every quarter.
Further reading
- AWS VPC User Guide. Dense but authoritative. The peering and TGW sections especially.
- "How AWS networking actually works" — various AWS re:Invent talks. NET402-shaped talks year over year; worth a watch when you join a new AWS-heavy team.
- Adjacent: Networking Codex. The protocols underneath the abstractions.
- Adjacent: How DNS works. Route 53 makes more sense when you understand the underlying protocol.
- Adjacent: Cost engineering. The networking cost surprises in detail.
Found this useful?