03 / 08
Cloud Codex / 03

Networking — VPC, subnets, NAT.

Cloud networking takes the longest to internalise of any topic in this Codex. Three months in, you'll still occasionally route a private subnet to an internet gateway and wonder why your packets are vanishing. The good news: the mental model is the same on every cloud. Once you understand VPC, subnets, route tables, NAT, and peering on AWS, the same words on GCP and Azure read fluently.


1 · The model

  • VPC. Your private IP space inside the cloud. A CIDR block — usually 10.0.0.0/16 for a small org. Nothing inside it is reachable from the internet by default.
  • Subnet. A slice of the VPC, pinned to one availability zone. Public subnets have a route to the internet gateway; private subnets don't. Most production workloads live in private subnets.
  • Route table. Per-subnet rules that say "for destination X, send to Y." Y is usually the internet gateway, a NAT gateway, a peer, or a transit gateway.
  • Internet gateway (IGW). The VPC's door to the public internet. Public subnets route 0.0.0.0/0 to it. Without an IGW, the VPC is fully internal.
  • NAT gateway. Lets a private subnet talk out to the internet (for package updates, calling external APIs) without being reachable from the internet.
  • Security groups. Stateful firewalls attached to network interfaces. Allow-only; the default-deny is implicit.
  • Network ACLs. Stateless subnet-level firewalls. Almost nobody touches these any more — SGs are enough.
  • Peering / Transit Gateway. Connect VPCs to each other, or VPCs to on-premise networks. TGW is the modern hub-and-spoke version; peering is point-to-point.

2 · The AWS canonical version

LayerAWS serviceWhat it does
Private networkVPCThe /16 you own inside the cloud.
Internet ingressInternet GatewayOne per VPC. Routable from 0.0.0.0/0.
Internet egress (private subnets)NAT GatewayManaged, HA, expensive. (Or NAT instance — DIY, cheap, fragile.)
Network firewallsSecurity Groups + NACLsSGs at the ENI level. NACLs at the subnet level.
VPC ↔ VPCVPC PeeringPoint-to-point. No transitive routing.
VPC ↔ many VPCs ↔ on-premTransit GatewayHub for any-to-any routing. Replaces meshes of peerings.
On-prem ↔ VPCVPN / Direct ConnectVPN over the public internet vs dedicated fibre into AWS POP.
Private access to AWS servicesVPC Endpoints (Gateway / Interface)Reach S3, DynamoDB, etc. without going over the public internet. Cheaper and more secure.
DNSRoute 53 (public + private hosted zones)Public for external, private for internal-only names.
L7 load balancingALB (Application Load Balancer)Path-based routing, WebSocket, gRPC.
L4 load balancingNLB (Network Load Balancer)TCP/UDP, ultra-low latency, preserves source IP.

3 · GCP and Azure equivalents

ConceptAWSGCPAzure
Private networkVPCVPC (global, not regional)VNet (regional)
SubnetSubnet (per AZ)Subnet (per region)Subnet (per region)
Internet ingressInternet Gateway(Built into the VPC; no separate IGW)Public IP + UDR
NAT egressNAT GatewayCloud NATNAT Gateway
FirewallSecurity Groups + NACLsVPC Firewall Rules (project-wide)NSGs (Network Security Groups)
VPC ↔ VPCVPC Peering / TGWVPC Peering / Network Connectivity CenterVNet Peering / Virtual WAN
On-prem ↔ cloudVPN / Direct ConnectCloud VPN / InterconnectVPN Gateway / ExpressRoute
L7 load balancingALBGlobal HTTP(S) Load BalancerApplication Gateway / Front Door
L4 load balancingNLBTCP/UDP Network LBStandard Load Balancer
DNSRoute 53Cloud DNSAzure DNS / Private DNS
GCP's VPCs are global by default. One VPC can span every region; subnets are regional. This is a real ergonomic win — no peering required between regions inside one VPC. AWS and Azure VPCs/VNets are regional, so multi-region usually means multiple VPCs plus TGW/Virtual-WAN to stitch them.

4 · The patterns you'll set up

  • Public subnet for ALBs, private for app + DB. Internet-facing ALB sits in public subnets across multiple AZs; targets sit in private subnets. The pattern for ~every web stack.
  • NAT for outbound from private subnets. Your app servers fetch package updates, call third-party APIs. One NAT per AZ (don't share across AZs — cross-AZ traffic costs money).
  • VPC endpoints for S3 / DynamoDB. Calling S3 from a private subnet without an endpoint sends traffic over NAT — both expensive and slower. Endpoints keep it on the AWS backbone.
  • Private hosted zones for internal DNS. Services talk to each other via friendly names, resolved only inside the VPC.
  • Hub-and-spoke for multi-VPC. A central VPC with TGW; satellite VPCs (one per business unit, environment, or product) attach to it. Simpler than a full mesh of peerings, easier to govern.

5 · What breaks

  • The VPC peering anti-pattern. Peering is non-transitive — if A↔B and B↔C, A still can't reach C through B. Once you have more than three VPCs, switch to TGW.
  • NAT cost surprise. NAT Gateway is ~$32/month plus $0.045/GB. A chatty app processing 10 TB through NAT is paying $450/month just for NAT. VPC endpoints and direct-to-AWS-service traffic dodge most of it.
  • Cross-AZ data charges. Within a region but across AZs, AWS charges per GB ($0.01/GB each way as of writing). A microservices mesh that doesn't pin to AZ-local replicas can rack up thousands a month.
  • Security group references. SGs can allow traffic from another SG (not just CIDRs). Useful and the right pattern — but if you delete the source SG someone forgot was referenced, you get a quiet break that's hard to debug.
  • DNS resolution inside the VPC. Two DNS resolvers, one for AWS-internal names and one for your private hosted zone, served from VPC CIDR base + 2. Custom resolvers (for split-horizon or on-prem integration) are an art unto themselves.
  • MTU mismatch on VPN / Direct Connect. Standard VPC MTU is 9001 (jumbo frames inside, 1500 over IGW). VPN tunnels drop the effective MTU. Apps that don't honour Path-MTU-Discovery (looking at you, badly-configured proxies) hang weirdly.

6 · Cost note

Cloud networking has three line items that quietly stack up:

  • NAT Gateway data processing. $0.045/GB. Audit it. The fix is VPC endpoints for AWS services and S3 Gateway endpoints (which are free).
  • Cross-AZ traffic. $0.01/GB each direction. AZ-aware service discovery (or AZ-local replicas) is the durable fix.
  • Egress to the internet. $0.05–$0.09/GB depending on volume and region. CDN in front of any external read traffic kills 80–95% of this.

The two-line summary: most cloud bills are eaten by data movement, not by compute. Audit the Cost Explorer's "Data Transfer" view every quarter.

Further reading

  • AWS VPC User Guide. Dense but authoritative. The peering and TGW sections especially.
  • "How AWS networking actually works" — various AWS re:Invent talks. NET402-shaped talks year over year; worth a watch when you join a new AWS-heavy team.
  • Adjacent: Networking Codex. The protocols underneath the abstractions.
  • Adjacent: How DNS works. Route 53 makes more sense when you understand the underlying protocol.
  • Adjacent: Cost engineering. The networking cost surprises in detail.
Found this useful?