TCP Handshake Simulator: the three-way handshake, in slow motion.
The TCP handshake is how two machines agree to talk and sync their sequence numbers before any data moves. It takes three packets to set up a connection (SYN, SYN-ACK, ACK) and four to tear it down. This animates the most-quoted picture in networking, packet by packet.
Two boxes and a wire. The left box is the client, the right is the server, and the label in each one is its current TCP state, the same names you'd see in netstat: LISTEN, SYN_SENT, ESTABLISHED, TIME_WAIT. The arrow on the wire is the packet in flight, with its flags and sequence numbers printed beneath it. Each press of Step advances one packet, and the ladder below lists all eight phases of a connection's life, setup through teardown, with the current one highlighted. The RTT clock in the header counts elapsed round-trip time.
Step through the first three rows and keep an eye on that clock. The connection doesn't reach ESTABLISHED until 1.5 round trips have passed, and not one byte of application data has moved yet; data is row 04. That is the tax every fresh connection pays, and it's the whole reason connection pooling and keep-alive exist. Then keep stepping into the teardown and notice the asymmetry: the side that closes first ends up stuck in TIME_WAIT while its peer is already fully CLOSED. Setup is symmetric; teardown is not.
What is the TCP three-way handshake?
Three packets, a polite introduction.
The TCP three-way handshake is the connection-establishment ceremony at the start of every TCP session. The client sends a SYN, the server replies with SYN-ACK, the client sends a final ACK; after those three packets both sides have agreed on initial sequence numbers and confirmed bidirectional liveness. The protocol was specified by Vint Cerf and Bob Kahn in 1974 and standardised in RFC 793 (1981, refreshed as RFC 9293 in 2022). Every web request, every database query, and every SSH session opens with this handshake.
Imagine two computers that have never spoken before. One wants to send a stream of bytes to the other. They've never agreed on where in the conversation they are; they don't know whether the other is alive; they don't know what initial sequence number to use to detect lost or duplicated bytes; they may not even agree on basic parameters like the maximum segment size their networks can carry. The internet's underlying protocol (IP) provides none of this. It just delivers individual packets, sometimes, in some order, possibly more than once, possibly never. TCP's job is to build a reliable byte stream on top of that.
Before any data flows, both ends need a tiny ceremony — a handshake — that establishes mutual willingness, exchanges sequence numbers, and confirms that bytes can travel both directions. That ceremony is the three-way handshake. The client sends a SYN ("I want to talk; my starting sequence number is X"). The server replies with a SYN+ACK ("OK; my starting number is Y; I confirm I saw your X"). The client sends a final ACK ("confirmed; I saw your Y"). After those three packets, both sides have proven they can send and receive, and both know the other's starting sequence number. Data can flow.
Why three? Two would not be enough. If the client sends SYN and the server replies SYN-ACK, only the server knows the client's send works. The client's send-and-receive is unconfirmed. The third packet — the client's ACK of the server's SYN — closes the loop. Four would be redundant; the SYN and the ACK can be carried in the same packet, which is why the middle step is called SYN-ACK and not two separate ones. The whole protocol is the smallest reliable mutual-liveness check that fits.
Concrete numbers anchor why this matters. On a fast home connection with a 20 ms round trip, the handshake costs about 60 ms before any application byte ships. On a transcontinental connection at 100 ms RTT, it's 300 ms. Add TLS on top — the modern standard — and you pay another two round trips for TLS 1.2 or one for TLS 1.3. This is why opening a fresh HTTPS connection feels noticeably slow even on gigabit fibre: TCP and TLS each charge their own setup tax. Reusing a connection (HTTP keep-alive, HTTP/2 multiplexing, connection pooling) skips that tax. TCP Fast Open, QUIC's 0-RTT, and TLS session resumption are all variations on "pay the cost once, amortise across many requests".
The simulator above plays the handshake at human speed. Watch the SYN climb up to the server, the SYN-ACK come back down, the final ACK go up; then the data segments and the eventual FIN-FIN-ACK-ACK that closes the connection. Every TCP connection on the internet, billions per second worldwide, opens and closes through exactly this choreography. The rest of this article walks through the eleven-state machine that governs the protocol, the security implications of how that initial sequence number gets chosen, the two related attacks (SYN flood and Mitnick), and the operational gotchas — TIME_WAIT, MSS clamping, Fast Open — that engineers actually run into.
Origins of TCP — from RFC 793 to RFC 9293
A protocol that predates the web.
The three-way handshake was specified by Jon Postel in RFC 793, September 1981, drawing on the earlier RFC 675 (December 1974) by Vint Cerf, Yogen Dalal, and Carl Sunshine that introduced the term “TCP”. The same shape still runs every TCP connection today — every web page, every API call, every database query that travels over a stream socket — more than four decades on. RFC 793 was finally obsoleted in August 2022 by RFC 9293, edited by Wesley Eddy, which folded forty-one years of clarifications, errata, and extensions into a single contemporary specification. The protocol the document describes is the same protocol; the difference is precision.
Why three packets, not two? The handshake has to convince both sides that they can both send and receive. SYN proves the client can send. SYN+ACK proves the server received and can both send and receive. ACK proves the client received the server's SYN. Two packets would leave one side guessing whether the other had heard. The handshake also exchanges initial sequence numbers (ISNs) so each direction has a baseline against which to detect retransmission, reordering, and replay. The choice of ISN is consequential: RFC 793 specified a 32-bit clock counter incrementing every 4 microseconds, on the assumption that segments from previous incarnations of a connection (same five-tuple) might still be in transit and could be confused with the current session. The 2-MSL TIME_WAIT delay at connection close is the same idea from the opposite direction.
The 4-microsecond counter turned out to be a security hazard. Robert Morris's 1985 paper “A weakness in the 4.2BSD Unix TCP/IP software” (Bell Labs Computing Science Technical Report 117) showed that a predictable ISN allowed source-address spoofing attacks: an attacker who could predict the ISN of a connection from a trusted host could complete the handshake without ever seeing the SYN+ACK, then inject commands on the connection. The Mitnick-Shimomura attack of December 1994 was the public demonstration. RFC 1948 (Steve Bellovin, May 1996) and RFC 6528 (Larry Eddy, February 2012) tightened the requirement to a cryptographically strong randomness source mixed with the four-microsecond clock; modern Linux uses SipHash over the connection five-tuple plus a per-boot secret to generate ISNs that are simultaneously increasing (so old segments are detected) and unpredictable (so spoofing is hard).
The handshake has stayed shaped the way Postel drew it because the underlying problem — mutual liveness and freshness across an unreliable network — has not changed. The optimisations that followed (SACK in RFC 2018, window scale and timestamps in RFC 7323, fast open in RFC 7413, multipath TCP in RFC 8684) extended the protocol without altering the three-packet rhythm. QUIC (RFC 9000) is the first transport to revisit that rhythm in earnest, and even QUIC's 1-RTT setup keeps the same logical exchange of nonces — it just folds them in with the cryptographic handshake.
The TCP state machine — eleven states, two halves
Eleven states, two halves.
The TCP state machine has eleven states. Five appear during connection setup; six during teardown. Most engineers never need to know more than the names — but a few states (TIME_WAIT, CLOSE_WAIT, FIN_WAIT_2) cause real production problems and are worth understanding in detail.
| State | Side | Meaning |
|---|---|---|
| LISTEN | server | Waiting for SYN. |
| SYN_SENT | client | Sent SYN; waiting for SYN+ACK. |
| SYN_RECEIVED | server | Sent SYN+ACK; waiting for final ACK. |
| ESTABLISHED | both | Connection live; data flowing. |
| FIN_WAIT_1 / 2 | closer | Sent FIN; waiting for FIN from peer. |
| CLOSE_WAIT | peer | Got FIN; app must call close(). Common bug: app forgot. |
| TIME_WAIT | closer | 2×MSL (~60 s) wait. Famous source of port-exhaustion bugs. |
The state most worth knowing in depth is TIME_WAIT. The original Maximum Segment Lifetime was specified at two minutes; modern Linux defaults the wait to 60 seconds (controlled by net.ipv4.tcp_fin_timeout). Its purpose is to absorb stragglers from the previous connection — segments that took an unusually long path through the network and arrive after teardown. Without TIME_WAIT, a freshly opened connection on the same five-tuple could mistake old packets for fresh ones, causing data corruption or replay. The 2×MSL window is the upper bound on how long a packet can survive in the network plus the round-trip needed to deliver an in-flight FIN.
The simultaneous open state appears when both sides initiate a connection to each other at the same time, sending SYNs that cross in flight. RFC 793 specified the resolution: each side responds to the other's SYN with a SYN+ACK, and the connection moves directly through SYN_RECEIVED to ESTABLISHED without a third packet. Simultaneous open is rare in client-server applications but not pathological in peer-to-peer NAT-traversal scenarios; UDP hole-punching is the more common modern technique.
SYN floods and SYN cookies — handshake-level attacks
A handshake is also an attack surface.
The handshake's elegance is also its weakness. A SYN flood, first publicly described by an anonymous attacker against Panix and other ISPs in September 1996 (and analysed by Bill Cheswick and Steven Bellovin in their book Firewalls and Internet Security), exploits the asymmetric cost of the second packet. An attacker sends a SYN with a forged source address; the server allocates a transmission control block, sends SYN+ACK, and waits up to 75 seconds for the ACK that will never arrive. A modest attacker on a 1990s residential link could fill the listener's half-open queue and deny service to legitimate users at near-zero cost.
SYN cookies (Daniel J. Bernstein, posted to USENET in September 1996, formalised in RFC 4987 in August 2007) flip the trade-off: the server stores no state on SYN. Instead, it encodes the connection details into the SYN+ACK's sequence number itself — a hash of source IP, source port, destination IP, destination port, a per-boot secret, and a coarse timestamp. When the legitimate client returns the ACK with sequence+1, the server reconstructs and validates the cookie and only then allocates a TCB. State is allocated on validation, not on speculation. Linux has had SYN cookies on by default in net.ipv4.tcp_syncookies = 1 mode (engaged on queue overflow) since kernel 2.4 in 2001; FreeBSD added them in 2003.
The compromise is real but narrow. SYN cookies cannot encode the full TCP options that would normally be exchanged in the SYN+ACK; specifically, SACK and timestamps and window scale are dropped or approximated. For most workloads this is invisible; for high-bandwidth long-fat-pipe connections the loss of window-scale negotiation can halve throughput. Modern Linux engages cookies only when the queue overflows, so steady-state operation pays no cost.
# Linux SYN cookie behaviour $ sysctl net.ipv4.tcp_syncookies net.ipv4.tcp_syncookies = 1 # 1 = enable when queue overflows net.ipv4.tcp_max_syn_backlog = 2048 # When SYN queue is full, kernel emits cookies; on validation, # reconstructs the TCB. Cost: a few microseconds of CPU per # accept. Side-effects: TCP options (SACK, timestamps, window # scale) are dropped or approximated.
The canonical large-scale SYN flood is the Mirai botnet's October 2016 attack on Krebs on Security and Dyn DNS — a 620 Gbps and 1.2 Tbps event respectively, sourced from compromised IoT cameras. Mirai's source is on GitHub; the SYN flood module is fewer than 200 lines of C. The defensive answer at that scale is upstream filtering at the network edge (Cloudflare, Akamai, AWS Shield Advanced) plus BGP flowspec announcements that drop traffic before it reaches the target's link.
TCP packet sizes — MSS, MTU, and what fits on the wire
How big is a packet, really?
The Maximum Segment Size option (RFC 879, November 1983; clarified by RFC 1122 in October 1989) is exchanged on the SYN. Each side advertises the largest segment it is willing to receive; both sides take the smaller of the two. On Ethernet (1500 byte MTU) the typical MSS is 1460 bytes (1500 minus 20 for IP and 20 for TCP); on PPPoE links (1492 byte MTU) it is 1452; on tunnels (GRE, IPsec, VXLAN) it can be much smaller. Get the MSS wrong and you pay either with fragmentation (slow, lossy, fragile) or with large segments dropped silently by middleboxes.
Path MTU Discovery (RFC 1191 by Jeff Mogul and Steve Deering, November 1990; the IPv6 equivalent in RFC 8201) is the dynamic version. The sender marks packets with the IPv4 Don't Fragment bit and listens for ICMP “fragmentation needed” replies; on receipt, it reduces the MSS for that route. The algorithm works in theory and fails wherever an over-aggressive firewall blocks ICMP. Cisco published guidance as early as 2003 about “PMTUD blackholes” in production networks — the connection establishes, small packets succeed, large packets vanish. Modern Linux mitigates with Packetisation Layer Path MTU Discovery (PLPMTUD, RFC 4821, March 2007) which probes by injecting deliberate large-packet loss and measuring loss rate, no ICMP required.
The interaction with VPNs and IPsec is the most common operational headache. An IPsec ESP tunnel adds 50–80 bytes of overhead; the inner MSS must be 1500 minus IPsec overhead minus IP minus TCP, often 1380 or 1340. Hosts that don't honor MSS clamping break in subtle ways: the SSH handshake works, the first command works, the reply with a long output hangs. The Linux iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu rule is the standard fix on the gateway.
QUIC sidesteps the issue by running over UDP and doing its own datagram framing — but the underlying IP fragmentation problem still exists; QUIC just exposes it to the application layer where it can be controlled explicitly. The 1500-byte Ethernet frame is the dominant constraint on Internet packet size for the same reason 1981's MSS choice still echoes today: the installed base does not change, even as everything around it does.
TCP Fast Open — skipping the handshake on repeat connections
Skipping the handshake, on repeat connections.
TCP Fast Open (RFC 7413, December 2014, Yuchung Cheng, Jerry Chu, Sivasankar Radhakrishnan, Arvind Jain) lets the client send data on the SYN itself for repeat connections to the same peer. The server hands out a fast open cookie on the first connection, encrypted with a server-side secret. The client presents the cookie on the SYN of subsequent connections, and the server processes the included payload immediately, saving one full round-trip for repeat visitors. On a 100 ms RTT mobile connection, that is a noticeable wall-clock improvement on TLS-protected page loads.
Adoption has been bumpy. Many middleboxes — carrier-grade NATs, transparent proxies, broken home routers — strip the TFO option from the SYN, causing the client to fall back to the legacy handshake. Linux has supported TFO since kernel 3.7 (2012); macOS added it in 10.11 (2015); Windows in 10 (2016); but the server-side adoption has been slow because most public-facing CDNs would rather invest in QUIC. In practice, TFO is most used inside controlled networks — Google's frontend, Facebook's mobile API, the Cloudflare edge to its origin servers — rather than on the open Internet.
QUIC (RFC 9000, May 2021, Jana Iyengar and Martin Thomson, editors) takes the next step: combine transport setup with cryptographic handshake. A 1-RTT QUIC handshake performs the equivalent of a TCP three-way handshake plus a TLS 1.3 handshake in a single round trip. A 0-RTT QUIC connection (a repeat visit to a known server with a cached session ticket) sends application data on the very first UDP packet. New connections take 1 RTT instead of TCP+TLS's two or three; HTTP/3 rides on QUIC for exactly this reason. Modern web sees noticeable mobile-network speedups: Google reported 8 percent reduction in search latency in 2017; Facebook reported similar numbers in their 2020 QUIC writeup; YouTube quotes between 9 and 18 percent reductions in startup time depending on RTT.
The cost of QUIC is a different shape of operational complexity. Connection migration (the same QUIC connection survives an IP address change, useful when a phone moves from Wi-Fi to LTE) requires CIDs that change semantics over time. The encryption is per-packet rather than per-connection, which means even the headers are inaccessible to middleboxes — the layered networking model breaks. Google's load balancer, Cloudflare's edge, and Apple's iCloud connections all run QUIC in production at scale; smaller deployments often find that the operational tooling is still maturing. The TCP three-way handshake will not disappear in this decade.
TIME_WAIT, MSS clamping, and other production gotchas
Where the handshake breaks in production.
TIME_WAIT exhaustion. The closing side of a TCP connection waits 2×MSL (~60 s on Linux, ~240 s on classical BSD) before reusing the source-IP, source-port, destination-IP, destination-port tuple. A high-throughput proxy making short-lived connections to one upstream IP and port can exhaust the ephemeral port range (default 32768–60999 on Linux) within seconds and start failing with EADDRNOTAVAIL. The mitigations, in order of preference: HTTP keep-alive on the proxy-to-origin connection (the right answer 95 percent of the time), connection pooling, expanding the ephemeral port range with net.ipv4.ip_local_port_range, enabling net.ipv4.tcp_tw_reuse = 1 (which permits reuse of TIME_WAIT sockets when the timestamp option proves the new connection is newer). The tcp_tw_recycle option that used to be the third lever was removed in Linux 4.12 (2017) because it broke catastrophically with NAT.
CLOSE_WAIT leaks. When the peer closes a connection, the local kernel transitions to CLOSE_WAIT and waits for the application to call close(). If the application has a bug that fails to close the socket on the error path, sockets accumulate in CLOSE_WAIT indefinitely. The symptom is ss -atn | grep CLOSE-WAIT | wc -l growing over hours; the diagnostic is to find the offending file descriptor with lsof and trace back to the application code. Java's HttpClient pre-2018 and Go's net/http with idle-timeout zero are common offenders.
Half-open connections. When a peer is forcibly removed (NAT timeout, machine reboot, network partition) without sending FIN, the local kernel still believes the connection is ESTABLISHED. Subsequent writes return success until a TCP retransmit timeout fires, often 15 minutes later (net.ipv4.tcp_retries2 = 15). The application sees a long hang followed by a delayed error. The TCP keepalive option (RFC 1122, default off, controlled by SO_KEEPALIVE) probes idle connections every two hours by default — which is far too rarely for most modern applications. Java HTTP clients, Postgres connection pools, and Kafka producers all expose tighter keepalive knobs; setting them is one of the most useful operational defaults a service can ship with.
RST flood and reset-injection. An on-path attacker who can predict ISNs and observe the connection can inject RST packets that cause one side to drop the connection. The Great Firewall of China is the most-studied production deployment of this technique against TLS connections to censored hosts; cloud providers occasionally see RST injection from misconfigured middleboxes. The defence at the protocol level is RFC 5961 (Anantha Ramaiah, Randall Stewart, Mitesh Dalal, August 2010), which tightens the validation rules for in-window RSTs and challenges suspicious ones with an ACK probe.
Linux TCP_DEFER_ACCEPT and FreeBSD accept_filter. Both options delay the accept() system call until the first data segment arrives, not just the third handshake packet. The effect is to skip the wakeup and context switch for connections that complete the handshake but never send data — a useful filter against probe scanners and a small efficiency win for HTTP servers that always expect the client to speak first.
BGP and IPsec interactions. BGP sessions run over TCP on port 179 with a long-lived connection; an unstable connection causes peer flapping that withdraws and re-announces routes globally. RFC 5925 (TCP-AO, the modern replacement for the deprecated MD5 signature option) authenticates BGP TCP sessions cryptographically. IPsec tunnels add per-packet overhead that interacts with MSS as discussed in Part 04, and IPsec rekey events can cause TCP retransmits even on otherwise-healthy connections. Both protocols predate modern transport thinking but remain operationally critical; understanding their TCP-level behaviour is the difference between a stable carrier network and a flapping one.
# Inspect socket states live
$ ss -tan state time-wait | wc -l
27483
$ ss -tan | awk 'NR>1 {print $1}' | sort | uniq -c
3 LISTEN
412 ESTAB
8 SYN-SENT
19 CLOSE-WAIT
27483 TIME-WAIT
2 FIN-WAIT-2
# tcpdump capture of a single handshake
$ sudo tcpdump -ni any -c 3 'tcp port 443'
12:04:51.220133 IP 10.0.0.42.51234 > 93.184.216.34.443: Flags [S], seq 1820301
12:04:51.262871 IP 93.184.216.34.443 > 10.0.0.42.51234: Flags [S.], seq 9421, ack 1820302
12:04:51.262998 IP 10.0.0.42.51234 > 93.184.216.34.443: Flags [.], ack 9422 If a stale Stack Overflow answer tells you to set net.ipv4.tcp_tw_recycle = 1, ignore it. The flag broke catastrophically with carrier-grade NAT and was removed in Linux 4.12 (2017). Use tcp_tw_reuse, expand the ephemeral port range, or — better — turn on HTTP keep-alive.
TCP options — MSS, SACK, window scale, timestamps
The handshake's option budget.
The 40-byte TCP options field, exchanged on the SYN, is a tiny but high-value real estate. Every meaningful TCP extension since 1981 has lived inside it. The original RFC 793 only specified MSS; subsequent RFCs added optional negotiations that have become essentially mandatory for modern performance.
Window scale (RFC 7323, originally RFC 1323 from 1992, Van Jacobson, Bob Braden, David Borman) extends the 16-bit window field with a left-shift factor of 0 to 14, allowing windows up to 1 gigabyte. Without window scale, a connection on a 100 ms RTT link is capped at 64 KB / 100 ms = 640 KB/s — less than a megabyte per second on any link with cross-continent latency. Window scale is the difference between TCP being usable on a transatlantic fibre and not.
TCP timestamps (also RFC 7323) carry a 32-bit sender timestamp and a 32-bit echo. They serve two purposes: more accurate round-trip-time estimation for the congestion-control algorithm, and protection against sequence-number wraparound on very fast connections (PAWS, Protection Against Wrapped Sequences). On a 10 Gbps link, the 32-bit sequence space wraps in roughly three seconds; the timestamp option lets the receiver distinguish wrapped sequence numbers from new ones.
Selective Acknowledgement (SACK, RFC 2018, October 1996, Matt Mathis, Jamshid Mahdavi, Sally Floyd, Allyn Romanow) lets the receiver tell the sender about non-contiguous received ranges. Without SACK, a single packet drop in the middle of a 64-segment window forces the sender to retransmit the dropped packet plus everything after it; with SACK, the sender retransmits only the missing segment. Fundamental for any connection that experiences loss; on by default in every modern stack.
Explicit Congestion Notification (ECN, RFC 3168, September 2001, Sally Floyd) lets routers mark packets to signal congestion rather than dropping them. The endpoints negotiate ECN capability on the SYN; routers along the path can then mark the IP ECN bits, and the receiver echoes the marks back via TCP. Adoption has been slow because legacy middleboxes drop ECN-marked packets, but Apple enabled it for iOS in 2016, the Linux kernel enables it for outgoing connections by default, and DCTCP (Data Center TCP, RFC 8257, October 2017) makes ECN the basis for fine-grained congestion control inside hyperscaler data centres.
The MultiPath TCP extension (RFC 8684, March 2020) is the most ambitious recent addition: a single logical connection that uses multiple subflows over multiple paths, allowing simultaneous WiFi and LTE on a phone, or LACP-bonded multipath on a server. Apple uses MPTCP for Siri and other latency-sensitive iOS services; the Linux kernel gained mainline MPTCP support in 5.6 (March 2020). The handshake option negotiates the capability; subsequent SYNs add subflows. The original three-packet shape still anchors the negotiation, even as the protocol's semantics evolve underneath it.
Further reading on the TCP handshake
Primary sources, in order.
- IETF · 2022RFC 9293 — Transmission Control ProtocolThe current canonical TCP specification. Supersedes RFC 793 from 1981 by folding 41 years of clarifications into one document. Long, formal, exact.
- Postel · 1981RFC 793 — the original TCPThe historical document. Worth reading for the prose alone; the algorithms are still the same algorithms.
- Eddy · 2007RFC 4987 — TCP SYN Flooding Attacks and Common MitigationsThe reference document on SYN floods, SYN cookies, and the operational tradeoffs between them.
- Cheng et al · 2014RFC 7413 — TCP Fast OpenThe cookie-based 0-RTT extension. Shorter than you'd expect, with a clear discussion of the security tradeoffs.
- Iyengar & Thomson · 2021RFC 9000 — QUICThe next-generation transport. Combines TCP and TLS into one round trip and runs over UDP. The companion RFC 9001 covers the TLS integration.
- Borman et al · 2014RFC 7323 — TCP Extensions for High PerformanceWindow scale, timestamps, PAWS. The options that make TCP usable on fast or long-RTT links.
- Linux Kernel docsip-sysctl — the production knobsThe complete list of net.ipv4.tcp_* tuning parameters with their defaults, semantics, and historical notes.
- Langley et al · SIGCOMM 2017The QUIC Transport Protocol: Design and Internet-Scale DeploymentGoogle's deployment paper on QUIC, with measured latency improvements at scale across YouTube and Search.
- Semicolony guideTCP, in slow motionThe full state machine in narrative form — handshake plus everything that comes after.
- Semicolony simulatorHTTP request flowWhat rides on top of the handshake. Headers, methods, status codes, the round-trip animated.