NAT and traversal
Most devices on the internet don't have a public IP address. They sit behind a NAT — a box that rewrites packet headers so many private hosts can share one public address. NAT was a stopgap for IPv4 exhaustion in the late 1990s and has been the default ever since. It also breaks the assumption every peer-to-peer protocol used to make: that you can just open a socket to the other side. The traversal stack — STUN, TURN, ICE — is how WebRTC, BitTorrent, and modern VPNs claw direct connections back from a world full of NATs.
Why NAT exists
IPv4 has a 32-bit address field. That's 4.3 billion addresses, and after several
large reserved ranges (loopback, multicast, RFC 1918 private, and so on) the usable
supply is closer to 3.7 billion. By the mid-1990s the allocation rate was already
alarming, and on 3 February 2011 IANA handed out its last /8 blocks to
the regional registries. APNIC ran out the same year, RIPE in 2012, ARIN in 2015,
LACNIC in 2020, and AFRINIC in 2019 (in the sense of entering its post-depletion
phase). For practical purposes the address space is gone.
NAT — Network Address Translation — was the stopgap. The original spec is
RFC 1631 (1994),
updated by
RFC 3022 (2001).
The idea is simple. Inside your network, you use private addresses from the RFC 1918
ranges — 10.0.0.0/8, 172.16.0.0/12,
192.168.0.0/16. A NAT box on the edge holds one (or a handful of) public
addresses and rewrites packet headers so the whole LAN appears to the outside as that
one public IP.
NAT was supposed to be temporary, the bridge to IPv6. IPv6 has been a working standard since 1998 and saw real production deployment in the 2010s, but global IPv6 adoption is still only around 45% of Google's traffic in 2026, and the share of services reachable purely over IPv6 is much smaller. Meanwhile NAT became permanent. Most people's home and mobile traffic crosses at least one NAT before it reaches the open internet.
How NAT actually works — the translation table
A NAT box sits on the boundary between a private network and the public internet. When a packet leaves an inside host for some outside address, the NAT rewrites the source IP (from the host's private address to the NAT's public address) and usually the source port too, picking a free port from its own pool. It records a mapping that looks like this:
(private_ip, private_port, dst_ip, dst_port)
10.0.0.5 , 51234 , 1.1.1.1 , 443
⇕
( public_ip, public_port, dst_ip, dst_port)
203.0.113.4, 40872 , 1.1.1.1, 443When the reply comes back, it's addressed to
203.0.113.4:40872. The NAT looks up the mapping, sees that this
external port belongs to 10.0.0.5:51234, rewrites the destination
IP and port back, and forwards it onto the LAN. The inside host has no idea any
of this happened.
The mapping isn't permanent. It has a timeout: typically about 5 minutes for TCP (because TCP carries its own connection state and the NAT can usually see FIN / RST), and around 30 seconds for UDP (because UDP is stateless and the NAT has no other signal). Mobile carriers are often more aggressive — UDP timeouts of 30 to 60 seconds are common. When the timer expires, the mapping is deleted, and any inbound packet that arrives after that has nowhere to go and is dropped.
10.0.0.5:51234 to two different destinations, the NAT might hand out two
different external ports — depending on the NAT's type, which is exactly what we cover
next.The four NAT types
The classic taxonomy comes from RFC 3489 (the original STUN spec, since obsoleted, but the names stuck). It sorts a NAT by two independent questions. First: when an inside host sends to two different destinations, does the NAT use the same external port for both? Second: what does the NAT let back in?
| NAT type | External port reuse | Inbound filter |
|---|---|---|
| Full-cone | Same port for any dest | Any outside host can send to it |
| Restricted-cone | Same port for any dest | Only IPs you sent to first |
| Port-restricted cone | Same port for any dest | Only the exact IP+port you sent to |
| Symmetric | New external port per dest | Only the exact IP+port you sent to |
Full-cone NAT is the most permissive. Once an inside host has sent anything
through, anyone on the internet who learns the external ip:port can send
to it. Restricted-cone tightens that to IPs the inside host has talked to;
port-restricted tightens further to specific peer ports. All three are "cone" NATs
because the inside-to-outside mapping is one external port per inside socket, whatever
the destination — a single cone pointing outward.
Symmetric NAT is the awkward one. Every distinct destination gets a fresh external port. The mapping the world sees when you talk to peer X isn't the mapping it sees when you talk to peer Y. This is the killer for hole-punching: STUN can tell you what your public address looks like to the STUN server, but that address is useless to anyone else, because they'd see a different one. Symmetric NATs are common in mobile carriers and many enterprise firewalls.
Comparing the four NAT types
Inbound arrows in the diagrams show which outside hosts can get a packet back to the inside host through the NAT mapping. Symmetric NAT looks similar to the others on the inbound side, but the external port the world sees differs for every destination, so a peer-to-peer address swap doesn't help.
Hairpinning — the gotcha for self-hosted services
You run a service at home — a media server, a VPN endpoint, a game server — and expose it via the NAT's public IP plus a forwarded port. From the outside internet, it works. From a laptop on the same LAN as the server, hitting the public IP often doesn't.
The problem is called hairpinning, or NAT loopback. The laptop sends a packet to
203.0.113.4:8443, the LAN's gateway. The gateway is the NAT itself, which
has to recognise that the packet is destined for one of its own port forwards, rewrite
both the source (to its own public IP) and the destination (to the inside host's
private address), and loop the packet back onto the LAN. That's hairpinning: traffic
enters and leaves the same interface.
Cheaper consumer routers often skip this. The packet hits the NAT, the NAT sees its own external IP as the destination, gets confused, and drops it. The workaround is either split-horizon DNS (return the private IP for queries from inside the LAN), an explicit hosts-file override, or a router that supports hairpin NAT. The IETF requirement is in RFC 4787 §6; compliance in cheap consumer gear is patchy.
CGNAT — when your ISP NATs you too
Even after RFC 1918 became universal, ISPs ran out of public IPv4 addresses to hand each customer. The answer was Carrier-Grade NAT (CGNAT), specified by RFC 6598. CGNAT works just like home NAT, but the operator runs it for many subscribers at once. A single public IPv4 address can front thousands of customers, each with its own slice of the port space.
RFC 6598 reserves a dedicated address block — 100.64.0.0/10 — for the
hop "between the home router and the carrier NAT". The carrier hands customers an
address in this block, then NATs them again at the carrier edge. Two layers of
NAT for one connection: a packet from your phone might be translated by your home
router (if any), then by the carrier's CGNAT, before it ever reaches the open
internet.
CGNAT breaks several assumptions. Inbound peer-to-peer is much harder, because there's no port forward you can set on a NAT you don't control. IP-based rate-limiting and abuse-handling get noisier — banning one IP might cut off thousands of innocent customers behind the same CGNAT. Logging by source IP no longer identifies a user; you also need the source port and a precise timestamp, and you need the carrier to keep mapping logs (which not all do). Mobile networks worldwide — including T-Mobile, Verizon, and almost every cellular carrier — run CGNAT by default.
STUN — Session Traversal Utilities for NAT
STUN is the simplest of the three traversal protocols. The original spec was RFC 3489; the current version is RFC 8489. The whole idea fits in one paragraph: a STUN server runs on a public IP. A client behind a NAT sends it a request. The server replies with what it saw — the source IP and source port that arrived at its socket. The client now knows what its own public mapping looks like from outside.
That public mapping is then usable, but only if the NAT is full-cone,
restricted-cone, or port-restricted. The client tells its peer "send packets to
203.0.113.4:40872", and as long as the client has already sent something
to that peer (or the NAT is full-cone), the inbound packets get through. Combine that
with both sides sending outbound at once — the classic UDP hole-punching trick — and
most cone NATs can be traversed without a relay.
Symmetric NAT defeats STUN. The mapping the client sees from the STUN server
is for the destination STUN-IP:STUN-port. When the client sends to the
peer, a symmetric NAT picks a fresh external port. The peer is handed the STUN-derived
address, but that address no longer points at the client. STUN can detect this (by
querying from two different STUN IPs and comparing the external ports it sees) but it
can't fix it.
# Public STUN servers — used by WebRTC clients worldwide
stun.l.google.com:19302
stun1.l.google.com:19302
global.stun.twilio.com:3478
stun.cloudflare.com:3478STUN traffic is small and stateless. Google's free public STUN server has run for over a decade and handles a large share of WebRTC's STUN traffic. There's no authentication; STUN itself reveals nothing sensitive — just what the world already saw when the packet arrived.
TURN — relay when STUN isn't enough
TURN — Traversal Using Relays around NAT, specified by RFC 8656 — is the brute-force fallback. When two peers are both behind symmetric NATs, or behind firewalls that block UDP entirely, no direct path can be set up. A TURN server has a public IP and acts as a relay. Both peers send their media to the TURN server, and the server forwards each side's packets to the other.
TURN is expensive. The bandwidth cost is real (every byte of a relayed call passes through the server twice — in and out), and the server itself costs money to run. Production WebRTC stacks always try STUN-discovered direct paths first and fall back to TURN only when ICE connectivity checks fail. Twilio and Daily.co both report that 10–20% of WebRTC calls in the wild end up needing a TURN relay; the rest find a direct or srflx path.
TURN listens on UDP by default but can also run over TCP, or even TLS-over-TCP on
port 443. The TCP and TLS modes exist because restrictive corporate firewalls and
hotel Wi-Fi often block UDP entirely, or block every port except 80 and 443. A TURN
server on turns://relay.example.com:443 looks identical to an
HTTPS connection on the wire and gets through almost anything.
ICE — picking the best candidate
ICE — Interactive Connectivity Establishment, specified by RFC 8445 — is the glue. It runs on top of STUN and TURN and turns "we have several possible addresses for each peer" into "we picked one that works". Each peer gathers a set of candidates:
| Candidate type | What it is | Cost |
|---|---|---|
host | Local IP and port on the device | Free, only works on the same LAN |
srflx | Server-reflexive — the public address the STUN server saw | Free, fails on symmetric NAT |
prflx | Peer-reflexive — discovered mid-connectivity-check | Free, often emerges during ICE |
relay | Allocated on a TURN server | Expensive but always works |
Each peer hands its candidate list to the other through a signalling channel (usually WebSocket; SDP offer/answer carries them). ICE then pairs them — every local candidate with every remote candidate — assigns each pair a priority, and runs a STUN connectivity check from one side to the other along each path. The first successful check elects the working pair; higher-priority pairs are tried first, so direct paths win over relayed ones whenever they work.
Trickle ICE is the optimisation everyone runs in production: instead of waiting for all candidates to be gathered before sending the SDP, each side trickles candidates over the signalling channel as it discovers them, and connectivity checks start in parallel. The first working pair can be promoted to "nominated" while gathering is still in progress, which shaves seconds off call setup.
NAT keepalives
A NAT mapping exists only as long as the NAT thinks the flow is active. With TCP, the NAT usually watches the FIN/RST and times out around 5 minutes after the last packet. UDP is harder — there's no end-of-flow signal — so the timeout is shorter, often 30 seconds, sometimes 60 on mobile carriers. To keep a long-lived flow alive, you have to send something through the mapping before the timer expires.
That something is a keepalive: a tiny packet (often a single STUN binding request, a
no-op SRTP packet in WebRTC, or WireGuard's PersistentKeepalive option)
sent every 20 to 30 seconds for UDP, or every few minutes for TCP. The cadence is set
a little below the expected NAT timeout — 25 seconds is a common default because it
survives even the most aggressive mobile-carrier 30 s UDP timer.
Long-lived TCP connections — WebSocket, SSH, gaming control channels — hit the
same problem. TCP keepalive at the OS level (SO_KEEPALIVE) is the obvious
tool, but it defaults to a 2-hour idle timer, far too long. Application keepalives at
30 to 60 seconds, or OS keepalive tuned with TCP_KEEPIDLE, are what
actually keep mobile sockets alive.
Real-world NAT traversal — what WebRTC actually does
Put it together and a typical WebRTC call runs through this sequence:
- Each peer gathers
hostcandidates from local interfaces,srflxcandidates by hitting one or more STUN servers, andrelaycandidates by allocating on a TURN server. - The signalling channel (usually WebSocket to an application server) exchanges SDP offer / answer, with the candidate list embedded. Trickle ICE means candidates flow as they're discovered, not in a single batch.
- ICE pairs candidates, ranks pairs by priority, and runs STUN connectivity checks on each pair concurrently. Host pairs are tried first; srflx pairs next; relay pairs last.
- The first pair to complete a check becomes nominated. DTLS handshake runs over the chosen path to set up keys for SRTP. Media starts flowing.
- Periodic STUN keepalives on the chosen path keep the NAT mapping live. If the chosen path degrades, ICE can re-nominate to a fresher pair without dropping the call.
Published stats from large WebRTC operators (Twilio, Daily.co, Jitsi) tend to land in the same range: 60–70% of calls use a direct srflx-to-srflx path, another 10–20% find a host-to-host or peer-reflexive path, and 10–20% end up relayed through TURN. The wide variation comes almost entirely from how many endpoints are on symmetric NATs — corporate and mobile traffic pushes the TURN share up.
Common mistakes
- Running a TURN server without auth. An open TURN server is free bandwidth for anyone — including spammers, botnets, and DDoS amplifiers. Always configure ephemeral TURN credentials, ideally with a short-lived HMAC username/password issued by your signalling server.
- Assuming a public IP means reachable. A datacentre VM may have a public IPv4 but a security group or host firewall that blocks every inbound port except 22 and 443. Hand that address out as a candidate, and ICE checks from outside fail silently.
- Forgetting hairpin. Self-hosted services tested only from outside the LAN often break for users on the local network. Either deploy a router that supports hairpin NAT or hand back the private IP via split-horizon DNS.
- Sharing port 443 between HTTPS and TURN-over-TLS. Putting
TURN on 443 to bypass restrictive firewalls is fine — but if the same machine
also serves HTTPS on 443, you need a multiplexer (HAProxy, nginx
streamblock, or STUN's magic-cookie detection) to route traffic correctly. Crossing the streams produces a baffling mix of TLS errors and STUN parse failures. - Not setting a NAT keepalive on long-lived UDP. Works in dev,
dies on cellular after 30–60 seconds. WireGuard's
PersistentKeepalive = 25exists precisely for this. - Treating ICE as instant. Even with Trickle ICE, candidate gathering plus connectivity checks plus DTLS handshake routinely takes 500 ms to 2 s on the wide internet. Your UI should account for it.
IPv6 doesn't end NAT
The textbook story is that IPv6's 128-bit address space removes any reason for NAT. In practice it's less clean. NPTv6 — Network Prefix Translation for IPv6, specified by RFC 6296 — is a stateless 1:1 prefix-rewrite that some enterprises deploy to keep renumbering independence. It avoids the port-mapping headaches of IPv4 NAT but still hides internal addresses from the outside.
Stateful IPv6 firewalls also produce NAT-ish behaviour without the address translation. Inbound connections are dropped unless they match a state entry set up by an outbound packet — functionally the same constraint as a restricted-cone NAT. Some enterprise IPv6 deployments even rotate the source interface identifier per flow for privacy (RFC 8981), giving symmetric-NAT-like behaviour where the address a peer learns from one channel doesn't match what they'd see on another.
The upshot: STUN, TURN, and ICE all support IPv6 and are still useful in IPv6-only and dual-stack deployments. WebRTC stacks gather IPv6 host candidates alongside IPv4 ones. The traversal stack is here to stay.
Further reading
- RFC 8489 — Session Traversal Utilities for NAT (STUN) — the current STUN spec; replaces the older RFC 3489 (1.0) and RFC 5389 (2.0). Short, readable, the canonical reference.
- RFC 8656 — Traversal Using Relays around NAT (TURN) — TURN over UDP, TCP, and TLS. Includes the credential mechanism every production deployment relies on.
- RFC 8445 — Interactive Connectivity Establishment (ICE) — the candidate-pairing and connectivity-check algorithm. Long but worth a careful read if you're debugging WebRTC.
- RFC 6598 — Shared Address Space for CGNAT
— the allocation of
100.64.0.0/10, with the rationale for not re-using RFC 1918. - RFC 4787 — NAT Behavioural Requirements — replaces the simplistic four-type taxonomy with the modern mapping- and filtering-behaviour vocabulary. The reference for what a "well-behaved" NAT is supposed to do.
- Ilya Grigorik — High Performance Browser Networking, ch. 13–18 — the WebRTC chapters cover STUN/TURN/ICE from a developer perspective with worked examples. Free online.
- Tailscale — How NAT traversal works — the best plain-English walkthrough of hole-punching, with diagrams of every NAT type and what works against each.
- Cloudflare — Anycast WebRTC and TURN at scale — how a large CDN runs TURN globally; useful production-grade view of the infrastructure.