TCP vs UDP Simulator: reliable bytes, or just bytes.

Two channels, same loss conditions. TCP runs a handshake, retransmits lost packets, and reassembles in order. UDP fires datagrams and shrugs at whatever doesn't arrive. Crank up loss and watch each protocol behave the way it was designed to.

TCP delivered
0/8
Retransmits
0
UDP lost
0
UDP reordered
0

Loss %
15%
Reorder %
10%
Packets
8
Run
TCP: reliable byte stream
SNDRCV
handshake = SYN / S/A / ACK · D# = data seq # · A# = cumulative ACK · R# = retransmit
UDP: fire-and-forget datagrams
SNDRCV
U# = UDP datagram · no handshake · no ACK · loss = silent drop · order = arrival order
TCP receiver: delivered (in-order)
app sees a clean sequence · gaps blocked at receiver until predecessor arrives
UDP receiver: arrival order
app sees whatever arrived · out-of-order pieces in orange · missing seqs = lost
handshake data ACK retransmit / drop

What you're looking at

Two wires fed by the same network conditions. The top one runs TCP: purple packets are the handshake, teal are data, green are the ACKs flowing back, and a rust X marks a packet dying mid-flight, followed shortly by a rust R chip retransmitting the same sequence number. The bottom wire runs UDP: just numbered datagrams, no return traffic at all. The two receiver panels below show what each application sees, and the sliders set the loss rate, reorder rate, and packet count before you press Send.

Set loss to about 30% and run it. UDP finishes almost immediately with holes in its sequence; TCP keeps grinding until every number is delivered, in order, no matter how many times a packet dies. Two things should surprise you. First, TCP's delivered row sometimes stalls completely while packets keep landing: a lost packet blocks everything behind it until the retransmit arrives. Second, neither protocol loses fewer packets than the other. TCP just converts loss into waiting, and UDP converts it into gaps.

Reliable byte stream vs unreliable datagrams

Everything else follows from that choice.

TCP gives you a reliable, ordered, byte-oriented stream between two endpoints. Once a connection is established, anything you write to the socket on one side will arrive at the other side, in the same order, or the connection will eventually die trying. UDP gives you nothing of the sort. You hand a datagram to the kernel; it tries to send it once; whether it arrives, whether anything else arrives before or after, is not its problem.

Every difference you see in the simulator above grows from that one design choice. The handshake exists because reliability requires shared state. The sequence numbers and ACKs exist because reliability requires accountability. The reorder buffer exists because ordering requires waiting. The retransmit timer exists because reliability requires recovery. UDP skips all of it because it promised none of it.

For most beginners the question becomes "which should I use." But that's the wrong question. The right question is "what does my application do when the network drops a packet?" If the honest answer is "stop and wait for that exact packet" you want TCP. If the honest answer is "keep going with whatever arrived" you want UDP.


Three-way handshake, step by step

Why TCP costs at least one round trip before any data moves.

Before TCP sends any payload, both sides exchange three packets. The client sends SYN with an initial sequence number. The server replies SYN-ACK acknowledging that number and offering its own. The client sends ACK. After those three packets, both sides have agreed on starting sequence numbers, window sizes, and a few options, and the connection is "established."

The point is not that three packets is a lot. The point is that those three packets cost at least one and a half round trips before the first byte of useful data crosses the wire. On a 60-millisecond RTT link, that's ninety milliseconds spent on bookkeeping. On a satellite link with 600 ms RTT, it's nearly a second. This is the cost of reliability: both sides need synchronised state before they can track losses and order.

You can poke at this on the simulator: hit Send and watch the purple handshake packets fly first. Only once they're done do the marine data packets start moving. UDP has no such ceremony. Packets one through N go out one after another, the moment Send is pressed.

SYN: the client opens

The client picks an initial sequence number (modern TCP picks it pseudo-randomly to avoid stale-connection attacks) and sends a SYN segment. The SYN flag is one bit; the rest of the segment carries options like MSS (maximum segment size), window-scale factor, SACK-permitted, and timestamps. The server, upon receiving SYN, allocates a small state called a TCB (transmission control block). This is one reason SYN flood attacks work, because the kernel will allocate memory for every fake SYN that arrives.

SYN-ACK: the server responds

The server picks its own initial sequence number, acks the client's, and sends back SYN-ACK. Modern TCP can piggyback options here too: it can echo the timestamp, negotiate SACK, or set the window-scale factor. The server is now half-open; if it never hears back from the client (because of a network drop or because the client was attacking), it will eventually time out the half-open connection and free the TCB.

ACK: the client confirms

The client sends a final ACK that acknowledges the server's sequence number. With this third packet the connection is established on both sides. Modern TCP also allows TCP Fast Open, where the client can include payload data in this third packet (and in the SYN, on subsequent connections), saving one RTT. But it requires both ends to support the option and a cookie from a prior connection.

Once the handshake is done, both sides know what sequence number to start sending from, what sequence number to expect, what their peer's window size is, and which options are in play. From here on, the data plane takes over.


Each byte gets a number

The accounting system that makes reliability possible.

TCP numbers every byte that flows across a connection. Not every packet — every byte. The sequence number in a TCP header is the byte offset of the first byte in this segment. The ACK number is the next byte the receiver is expecting. Both run from the initial sequence numbers picked during the handshake and wrap around at 4 GB (so on a 100 Gbps link you can wrap the sequence space in under a second, which is why PAWS, protection against wrapped sequences using timestamps, exists).

ACKs are cumulative by default: an ACK of seq=1001 means "I've received every byte through 1000 and I'm expecting byte 1001 next." Selective ACK (SACK) is an option that lets the receiver also describe specific out-of-order ranges, which dramatically reduces unnecessary retransmits. Modern TCP stacks negotiate SACK during the handshake; almost all implementations support it.

On the simulator above, each data packet shows its sequence number, and the ACK packets show the cumulative ack. When you crank up loss, watch how the ACK number stays pinned at the last in-order byte even as out-of-order packets arrive at the receiver. That's cumulative acking in action.


Send several, wait, send several more

Why TCP doesn't wait for each ACK before sending the next packet.

If TCP sent one packet, waited for the ACK, then sent the next, throughput would be capped at packet-size divided by round-trip-time. On a 60 ms RTT link with 1500-byte packets, that's 25 KB/s. Abysmal. The sliding window solves this by letting the sender have several packets in flight at once, up to a negotiated window size.

The window size is the smaller of (a) what the receiver advertises as buffer space and (b) what congestion control allows. Receiver window comes from the rwnd field in the TCP header; congestion window is tracked by the sender as a local variable. The effective window is min(rwnd, cwnd). Modern TCP can scale rwnd up to a gigabyte with the window-scale option negotiated during the handshake.

In the simulator, the window is fixed at 4 for visual clarity. Real connections have windows of dozens to hundreds of packets in flight on healthy paths. The key insight: pipelining converts a serialized request-response into a streaming pipe limited only by bandwidth, not by latency.

Bandwidth-delay product

For a link with bandwidth B and round-trip time R, the bandwidth-delay product B × R is the amount of data that can be "in flight" on the wire at once. A 100 Mbps link with 60 ms RTT can hold 6 Mbits = 750 KB of data in flight. If your window is smaller than the BDP, the link is idle waiting for ACKs. If it's larger, you're filling buffers and causing latency.

This is why long-fat-network (LFN) tuning matters for high-bandwidth, high-latency paths: a 10 Gbps satellite link with 600 ms RTT has a BDP of 750 MB, far larger than the default window of any TCP stack. You'd configure window-scale, you'd enable a higher initial cwnd, and you'd probably switch to BBR or CUBIC for the congestion-control algorithm.


When a packet doesn't come back

How TCP knows something was lost, and when it gives up.

TCP has two ways to detect loss. The first is the retransmission timeout (RTO): a timer started when each packet is sent. If the ACK doesn't arrive before the timer expires, the sender retransmits. The RTO is computed from a smoothed RTT estimate plus four times the RTT variance, with a floor and a ceiling. Each successful round trip adjusts the estimate; each retransmit doubles it (Karn's algorithm prevents ambiguous samples).

The second is fast retransmit: if the receiver sends three duplicate ACKs (because it keeps receiving out-of-order packets that don't move the cumulative ack forward), the sender retransmits the missing packet immediately, without waiting for the timer. This shaves a full RTO off recovery time and is the workhorse for packet loss on healthy connections.

You can see retransmit in the simulator: when a data packet gets dropped (red X), the sender waits a beat, then sends an R# packet. That's the retransmit. The receiver, once it sees the missing piece, finally advances its cumulative ACK and the rest of the buffered out-of-order data joins the "delivered" list.

Why retransmits compound latency

A single packet drop costs at least one round trip for the retransmit, plus the wait for the receiver to actually receive the duplicate ACKs (which depends on cwnd and how many packets are in flight). On a 60 ms RTT link, a single drop can add 100+ ms of latency to that flow. That's why a 1% loss rate on a TCP stream feels much worse than 1%. Every flow with a drop has a stall, and stalls dominate the user-perceived latency.

UDP has none of this. A lost UDP packet is lost. The application can choose to retransmit, or it can choose to skip ahead. That's why audio codecs use forward error correction, and why game-state protocols send periodic full-state snapshots instead of relying on incremental updates.


Decision table

NeedPickWhy
Every byte in order, no loss tolerableTCPWeb pages, file transfer, database protocols. Loss = retry, never silent.
Low-latency, loss-tolerant mediaUDP (+ RTP/SRT/WebRTC)Audio/video. A late packet is worse than a lost one — late = stuttered playback.
One-shot request/response, small payloadUDP (DNS-style)Handshake overhead would dominate. App-level retry is one line.
Gaming, telemetryUDPFrequent updates, old data is worthless; the new packet supersedes.
HTTP-style streams over lossy linksQUIC (HTTP/3)Per-stream loss recovery; no transport-layer head-of-line blocking.
Multicast or broadcastUDPTCP is strictly point-to-point.
Need exactly-once over networkTCP + app-level idempotency keysNeither protocol gives exactly-once on its own — TCP is at-least-once with ordering.
Tunnels and overlaysUDP (Wireguard, GENEVE)Encapsulation in TCP causes TCP-meltdown over lossy paths.
Streaming live video to many viewersUDP-based (SRT, WebRTC, QUIC)HTTP-over-TCP can't keep up with playback during loss spikes.
Mostly anything elseTCPThe defaults are right for most use cases. Reach for UDP only when latency or loss-tolerance demands it.

The "TCP over TCP" trap. Tunneling TCP inside TCP (e.g., a VPN over a TCP socket) causes a vicious feedback loop on lossy links: both layers retransmit, the outer retransmit triggers inner retransmits, congestion windows fight. This is why production VPNs (Wireguard, IPsec) use UDP underneath even though the payload is reliable.

Things that bite people

Nagle's algorithm vs delayed ACKs

Nagle's algorithm batches small writes on the sender to reduce header overhead; delayed ACKs batch ACKs on the receiver to piggyback on outgoing data. When they interact badly (small writes on one side, no return traffic on the other), you get up to 200 ms of latency for no reason. The fix on most stacks is setsockopt with TCP_NODELAY (or use a single write() call for the whole logical message).

Head-of-line blocking

Because TCP is a single ordered byte stream, a lost packet stalls the entire stream until it's recovered. If you're multiplexing many requests over one TCP connection (HTTP/2 does this), a single drop stalls all of them. HTTP/3 with QUIC fixes this by giving each stream its own loss-recovery state on top of UDP.

Connection-establishment latency

That 1.5-RTT cost adds up. On mobile networks with 100-200 ms RTT, ten new TCP connections per page load adds two seconds of pure handshake latency. TLS adds another round trip on top. This is why HTTP/2 multiplexes onto one TCP connection, why TLS 1.3 is one-RTT, and why TCP Fast Open + 0-RTT TLS resume can save another round trip for repeat connections.

TIME_WAIT

After closing a TCP connection, the side that sends the final ACK enters TIME_WAIT for 2×MSL (typically 60–120 seconds). This protects against late packets from the old connection being mistaken for the new one if the 4-tuple is reused. On busy short-connection services, TIME_WAIT can exhaust ephemeral ports. The fix is usually SO_REUSEADDR/SO_REUSEPORT, longer-lived connections, or keepalive.

UDP "reliable" hacks

Every UDP-based protocol that needs some reliability ends up reinventing pieces of TCP. RTP has sequence numbers and timestamps; SRT adds full retransmission; QUIC has streams with per-stream loss recovery. The lesson: if you find yourself needing all of TCP's mechanisms, use TCP. If you need a subset, building a custom UDP-based protocol can be worth it.

Buffer bloat

Oversized buffers in routers cause TCP congestion control to behave badly: packets aren't dropped (the loss signal), they're just buffered for ages, so TCP thinks the link is fine and keeps pushing. The result is huge latency for short flows sharing the link. The fix is at the router (CoDel, FQ_CoDel, PIE) rather than the endpoint.


Order-of-magnitude expectations

MetricTCPUDP
Connection setup1.5 RTT (handshake) + 1 RTT (TLS 1.3) = ~250 ms on 100 ms RTT link0 RTT — first packet contains data
Per-packet overhead20-byte TCP header + 20 byte IPv4 = 40 bytes8-byte UDP header + 20 byte IPv4 = 28 bytes
Throughput on clean link~95% of bandwidth × delay product (window-limited)~99% of bandwidth (no ACK overhead)
Throughput at 1% loss~50% of clean throughput (cwnd halves on each loss event)99% of bandwidth — but only 99% of packets arrive
Throughput at 5% loss~10% of clean throughput95% of bandwidth, 95% delivery rate
Throughput at 20% lossoften near-zero — TCP stalls in repeated loss recovery80% of bandwidth, 80% delivery rate
Memory per connection (kernel)~10-30 KB (TCB, buffers)~1 KB (socket struct only)
Connections/sec (single thread, accept loop)~50k/s (epoll-based)~500k+ packets/s
Latency of one round-trip request/response~3 RTT (handshake + req + resp) on cold connection; 1 RTT on warm1 RTT — first packet is the request

The 5% loss row is illustrative: at meaningful loss rates, TCP throughput collapses while UDP throughput holds steady (at the cost of delivering only the packets that survived). This is the trade-off in one number.


The simplest reliable echo and the simplest datagram echo

TCP server (Python)

import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(('0.0.0.0', 9000))
    s.listen(128)
    while True:
        conn, addr = s.accept()
        with conn:
            while True:
                data = conn.recv(4096)
                if not data: break
                conn.sendall(data)    # echo back; sendall handles partial writes

UDP server (Python)

import socket

with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as s:
    s.bind(('0.0.0.0', 9000))
    while True:
        data, addr = s.recvfrom(65535)  # one full datagram per recv
        s.sendto(data, addr)            # echo back; one packet per send

Two key differences in code. First: TCP needs listen and accept because connections are explicit; UDP has no connections to accept. Second: TCP's recv returns whatever bytes arrived in the stream (it might be half a message or two messages glued together), so application-level framing is needed; UDP's recvfrom always returns exactly one datagram.

TCP client (Go)

conn, err := net.Dial("tcp", "server:9000")
if err != nil { log.Fatal(err) }
defer conn.Close()

conn.Write([]byte("hello
"))
buf := make([]byte, 1024)
n, _ := conn.Read(buf)
fmt.Printf("got: %s", buf[:n])

UDP client (Go)

conn, err := net.Dial("udp", "server:9000")
if err != nil { log.Fatal(err) }
defer conn.Close()

conn.Write([]byte("hello"))        // one datagram, fire-and-forget
buf := make([]byte, 1500)
conn.SetReadDeadline(time.Now().Add(time.Second))
n, _ := conn.Read(buf)             // may time out if packet was lost
fmt.Printf("got: %s", buf[:n])

The Go UDP client adds a read deadline because the reply may never arrive. The app must decide how long to wait and whether to retry. TCP carries no such uncertainty; if the connection is up, the bytes are coming.


What's in each header

TCP header (20 bytes minimum)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        Sequence Number                        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Acknowledgment Number                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Off  | Reserve|U|A|P|R|S|F|        Window                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Checksum            |         Urgent Pointer        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Options (variable, 0-40 bytes)             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The flag bits U-A-P-R-S-F are URG, ACK, PSH, RST, SYN, FIN. Options carry MSS, window-scale, SACK, timestamps. Modern stacks always have 32-40 bytes of options on the SYN, then 12 bytes of timestamp on each data segment.

UDP header (8 bytes, fixed)

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Source Port          |       Destination Port        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             Length            |           Checksum            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

That's the whole header. Length is the UDP header plus payload. Checksum is optional on IPv4 (rarely turned off), required on IPv6. No state, no options, no sequencing: every datagram is independent.


How TCP shares the network

The most complicated thing about TCP, and the part UDP delegates to the application.

If TCP just sent as fast as the receiver could accept, every TCP connection on the internet would compete for bandwidth and the network would collapse. Congestion control is the algorithm that keeps that from happening. It treats packet loss as a signal of congestion and backs off the sending rate; on success, it slowly ramps the rate back up. Every TCP flow runs an instance of this algorithm.

Slow start

New connections start with a small congestion window (cwnd), historically 1, now 10 MSS in modern stacks (RFC 6928). For every ACK received, cwnd doubles. This is exponential growth; the connection rapidly probes for the available bandwidth. Slow start exits when cwnd reaches ssthresh or when loss is detected.

Congestion avoidance

Past ssthresh, cwnd grows linearly, by one MSS per RTT. This is the gentler "fill the available pipe" mode. On loss, cwnd is halved (Reno) or reduced by a less-aggressive factor (CUBIC), and the algorithm returns to congestion avoidance from that lower starting point.

Fast recovery

When fast retransmit fires (three duplicate ACKs), cwnd is halved but the connection stays in a "recovery" state. It can still send new packets to keep the pipe partially full while the retransmit propagates. Without fast recovery, every loss event would drain the pipe back to a small window, hurting throughput.

Modern algorithms

CUBIC (default on Linux) uses a cubic function of time-since-last-loss to grow cwnd, which is friendlier to high-bandwidth long-RTT paths than Reno's linear growth. BBR (Google's algorithm, ships in Linux) estimates bottleneck bandwidth and round-trip time directly, and tries to fill the pipe without inducing queue buildup. It doesn't rely on loss as the only congestion signal. Both share the basic shape but differ in growth and reaction.

UDP has no built-in congestion control. Applications using UDP at scale must build their own (or use a protocol like QUIC that bakes it in). Otherwise a single high-rate UDP sender can crowd out every TCP flow sharing the link. This is "fairness": TCP is fair by construction; UDP is fair only if the app makes it so.


UDP, but with TCP's promises baked in per stream

QUIC takes UDP and builds on top of it everything TCP gives you (reliability, ordering, congestion control, encryption) but with one critical difference: it does so per logical stream instead of per connection. A single QUIC connection can carry hundreds of independent streams; loss on one stream does not block the others. HTTP/3 is HTTP semantics over QUIC.

Why UDP underneath? Two reasons. First, kernel TCP stacks are slow to change and middleboxes don't allow new TCP options to be deployed widely. By building QUIC as a userspace protocol on top of UDP, Google could iterate and deploy without waiting for kernel releases or middlebox cooperation. Second, integrating TLS and transport allowed for sub-1-RTT connection setup (0-RTT on warm connections), which is impossible to retrofit into TCP+TLS.

What QUIC does that TCP can't

Connection migration. A QUIC connection survives a change of IP address. You can switch from WiFi to LTE without dropping streams. TCP connections are pinned to a 4-tuple and die when any element changes.

Independent streams. Loss on stream A blocks only stream A. HTTP/2 over TCP suffers head-of-line blocking because TCP is one ordered stream. HTTP/3 over QUIC eliminates this.

Better RTT estimation. Every QUIC packet carries a packet number that is monotonically increasing and never reused (unlike TCP sequence numbers, which can be ambiguous after retransmit). RTT estimates are cleaner.

Integrated encryption. QUIC packets are encrypted from the very first byte (after a tiny header). There's no "plaintext handshake then start TLS." Encryption is baked into the protocol design.

The trade-offs: QUIC currently uses more CPU per packet (because crypto is per-packet and the stack is in userspace), and many older networks block UDP or rate-limit it. Most CDNs support HTTP/3 today (Cloudflare, Fastly, Akamai); browser support shipped in 2020-2021.


Who uses what

ServiceTransportWhy
HTTP/1.1, HTTP/2TCP (+ TLS)Order matters; the web grew up on TCP.
HTTP/3QUIC over UDPPer-stream loss recovery, faster connection setup, connection migration.
DNS (small queries)UDPSingle packet roundtrip; app retries. Falls back to TCP for large responses.
DNS-over-TLS, DNS-over-HTTPSTCPEncrypted and reliable.
WebRTC mediaUDP (via RTP/SRTP)Real-time audio/video — late packets are useless.
WebRTC data channelUDP (via SCTP)Reliable or unreliable, configurable per channel.
Discord voiceUDPVoice latency dominates user experience.
Discord chatTCP (over WebSocket)Messages must arrive in order; loss = retry.
Zoom, Google Meet, TeamsUDP for media, TCP for signalingStandard pattern for real-time conferencing.
Netflix streamingTCP (HLS over HTTPS)Adaptive bitrate over HTTP — TCP is fine because of buffering.
YouTube LiveQUIC (HTTP/3) for delivery, RTMP for ingestLower latency on the receive side.
Twitch ingestRTMP (TCP)Encoder-friendly, established standard.
Online games (FPS)UDP custom20-60 Hz state updates; old data discarded.
League of LegendsUDP with custom reliable layerGame actions reliable; pings unreliable.
Postgres, MySQLTCPQuery/response protocol over a stream.
MongoDBTCPSame.
MemcachedTCP, UDPBoth supported; UDP saves connection overhead for tiny gets.
RedisTCPRESP is a text-based stream protocol.
NTPUDPTiny packets, app retries, no need for ordering.
SNMPUDPPolling small status from many devices; UDP is cheap.
NFSTCP (UDP legacy)TCP is the default since NFSv4.
gRPCHTTP/2 over TCP (or HTTP/3 over QUIC)Application-layer streams over a transport that already multiplexes.
KafkaTCPLong-lived connections, batched writes.
SSHTCPInteractive shell needs ordered bytes.
SIP (telephony signaling)UDP (or TCP for large)Small, frequent messages.
SMTPTCPEmail needs to arrive correctly.
SyslogUDP (or TCP for reliable)UDP for cheap fire-and-forget logs.
StatsDUDPMetric pings — drop a few, you don't care.
Wireguard VPNUDPTunnel transport; avoids TCP-over-TCP.
IPsec ESPUDP (or raw IP)Same reason.

Notice the pattern: anything chat-like or transactional (web, databases, queues, SSH, email) lives on TCP. Anything real-time or stateless-tolerant (media, DNS, telemetry, gaming, tunnels) lives on UDP. HTTP/3 is the visible exception, building reliability back on top of UDP at the application layer.


How to inspect real flows

tcpdump and Wireshark

tcpdump -i any -w trace.pcap captures everything; Wireshark opens the pcap and lets you follow streams, see retransmits highlighted in red, view RTT graphs, and look at every byte of every header. Indispensable for any network debugging. The Statistics → TCP Stream Graphs → Round Trip Time view is the first thing to open when investigating a slow connection.

ss (Linux)

ss -tan shows all TCP sockets with their state, cwnd, rwnd, and ESTABLISHED-time. ss -ti adds extra info like the congestion-control algorithm in use, rto, srtt, and cwnd. Essential for seeing what the kernel thinks of an in-progress connection.

netstat

Older but available everywhere. netstat -s dumps per-protocol counters: SYN cookies sent, segments retransmitted, out-of-order packets, ACKs lost. The aggregate counters tell you what's hurting at the system level.

iperf3

Bandwidth and throughput testing. iperf3 -c host -t 30 measures TCP throughput over 30 seconds; iperf3 -c host -u -b 100M sends UDP at 100 Mbps and reports loss/jitter. Use it to baseline a link before debugging app-level latency.

tc (traffic control)

Linux tc lets you simulate the network conditions in this simulator: tc qdisc add dev eth0 root netem loss 5% delay 50ms reorder 10% adds 5% loss, 50 ms delay, and 10% reorder. Use it to validate your application's behavior under bad network conditions before shipping.

eBPF / bpftrace

For deep visibility into per-connection kernel-level metrics. tcpconnect, tcpaccept, tcpretrans, and tcprtt from bcc/bpftrace are pre-built scripts that surface what the kernel is doing in real time. The most modern way to debug.


What goes wrong in adversarial settings

SYN flood (TCP)

An attacker sends millions of SYN packets with spoofed source addresses. The server allocates a TCB for each one and waits for a final ACK that never arrives. The pool of half-open connections fills up and the server can't accept new legitimate ones. Defense: SYN cookies (RFC 4987) encode the connection state in the SYN-ACK so no TCB is allocated until the final ACK proves the client is real.

UDP amplification

An attacker sends a small UDP query with a spoofed source IP (the victim) to a service that returns a large response. The service sends the response to the victim. With DNS, NTP, memcached, and similar protocols, amplification factors can reach 50x to 50,000x. This is why memcached must never be exposed to the public internet, why DNS servers should enforce response-rate limiting, and why network operators should deploy BCP38 ingress filtering.

TCP sequence-number guessing

If an attacker can predict the initial sequence number a server picks, they can inject packets into an established connection (a "blind RST" or "blind data injection"). Modern stacks pick the ISN with cryptographically strong randomness, making this very hard. The off-path attack is still possible against very old stacks.

UDP source spoofing

Because UDP has no handshake, any source IP can appear on a datagram. Without authentication at the application layer, a UDP-based service that accepts queries from any source is open to spoofing, amplification, and impersonation. This is why DNSSEC, QUIC's integrated TLS, and protocols like Wireguard with built-in cryptographic identity exist.

Connection hijacking

Both TCP and UDP are vulnerable to on-path attackers who can read and inject packets. The defense is end-to-end encryption (TLS for TCP, DTLS for UDP, or QUIC which integrates encryption into the transport). Anything else can be modified, re-ordered, or replayed by an active attacker on the path.


When TCP and UDP both struggle

Mobile networks present a particular set of pathologies. Latency can swing from 30 ms to 2000 ms within seconds as the radio switches power states. Bandwidth can drop from 50 Mbps to 100 Kbps moving from outside to inside a building. Loss rates can spike during handover between cells. IPs change as devices switch from WiFi to LTE.

TCP suffers because the RTO and cwnd estimates calibrated to one set of conditions become wildly wrong when the conditions change. A long stall in the radio causes the connection to time out and retransmit, but the radio is back by the time the retransmit goes out, so the retransmit just wastes bandwidth and the connection has to climb out of a too-low cwnd.

UDP-based protocols can do better here because they can react more quickly to changing conditions. QUIC's connection migration directly addresses the IP-change problem: the connection survives a WiFi-to-LTE transition by binding to a connection ID instead of a 4-tuple. WebRTC's adaptive bitrate algorithms can respond to RTT changes within milliseconds, not seconds.

This is one of the strongest arguments for HTTP/3 in 2026: not just performance under loss, but resilience to the kind of network turbulence mobile users experience daily.


The 1970s decision to split

The original ARPANET protocol, NCP, was a single layered protocol that handled everything. By the mid-1970s it was clear NCP didn't scale, and that different applications wanted different properties from the network. Real-time voice (the ARPANET had voice experiments going back to 1974) didn't want retransmission; bulk file transfer absolutely did.

Vint Cerf and Bob Kahn's 1974 TCP paper described a single reliable protocol. Within a few years it was clear that the reliability layer should be separable, so that applications wanting only datagram delivery could skip the overhead. By RFC 768 (1980), UDP was specified as a thin layer on top of IP. TCP and UDP both shipped as parts of the same internet stack, occupying the same protocol number space.

That layering decision is still paying dividends. QUIC sits on UDP, building reliability back on top with new design freedom. Wireguard sits on UDP, providing encrypted tunneling without TCP-over-TCP pathologies. Real-time media protocols sit on UDP, choosing exactly which reliability mechanisms to build at the app layer.

The lesson for protocol design: separate the layers, let applications choose what they need. The Internet was designed end-to-end, with the network providing a minimal service and the endpoints providing whatever else they wanted. That principle (the "end-to-end argument" — Saltzer, Reed, Clark 1984) is why the internet scaled.


Common interview prompts and how to answer them

"What's the difference between TCP and UDP?"

Answer the four properties: reliability, ordering, flow control, connection-oriented. TCP provides all four; UDP provides none. Then give the consequence: TCP costs at least 1.5 RTT before any data and re-sends lost packets; UDP fires immediately and ignores loss. Finish with a real-world example: HTTP is on TCP, DNS queries are on UDP. Don't just recite features — frame each as a trade-off.

"Why would you ever use UDP?"

When timeliness beats completeness. Voice and video where a late packet is worse than a lost one. DNS queries where the round-trip needs to be tiny. Game-state updates where the new packet supersedes the old. And as the foundation for protocols like QUIC where the app builds custom reliability semantics.

"Walk me through what happens when I type a URL in a browser."

This is where the TCP/UDP distinction matters: DNS resolution is one UDP roundtrip (or one TCP for large responses); TCP connection is 1.5 RTT; TLS handshake is 1 more RTT (TLS 1.3) or 2 (TLS 1.2); HTTP request is sent and response begins. Mention HTTP/3 + QUIC as the modern shortcut that combines transport and TLS handshakes into 1 RTT (or 0 on resume).

"How does TCP know a packet was lost?"

Two mechanisms: retransmission timeout (per-packet timer that fires if no ACK arrives) and fast retransmit (three duplicate ACKs from the receiver). The first is the safety net; the second is the workhorse for healthy connections. Mention that congestion control halves cwnd on loss, which is why packet loss is so costly to throughput.

"What's the maximum size of a UDP datagram?"

65,535 bytes (16-bit length field) minus 8 (header) = 65,527 payload bytes. But in practice, anything over the path MTU (typically 1500 bytes) gets fragmented, and fragmentation is unreliable: if any fragment is lost, the whole datagram is lost. Best practice is to send UDP datagrams no larger than 1472 bytes (1500 - 20 IP - 8 UDP) to avoid fragmentation entirely.

"How does HTTP/3 differ from HTTP/2?"

HTTP/3 runs over QUIC (which is over UDP) instead of TCP. Two benefits: no head-of-line blocking (independent streams), and connection migration (survives IP changes). Setup is faster too — QUIC integrates TLS, so connection establishment is 1 RTT instead of 2-3.

"What's a TIME_WAIT and why does it matter?"

State a TCP endpoint enters after closing a connection it initiated the close on. Holds for 2×MSL (typically 1-2 minutes). Prevents stale packets from old connection from being interpreted as part of a new connection if the 4-tuple gets reused. On busy services it can exhaust ephemeral ports — fix with SO_REUSEADDR, connection pooling, or making the server be the one to close.

"What does the sliding window do?"

Lets the sender have multiple packets in flight at once, up to a negotiated size. Without it, TCP throughput would be capped at packet-size divided by RTT. The window is min(receiver window, congestion window) — receiver bounds it for buffer safety, sender bounds it for network fairness.


Bugs both protocols invite

Assuming TCP preserves message boundaries

It doesn't. TCP is a byte stream, not a message protocol. If you write("ab") then write("cd"), the receiver might read "ab" then "cd", or "abcd", or "a" then "bcd". Always frame your messages with a length prefix or a delimiter. This is the most common bug in any TCP code written by a UDP-experienced engineer.

Assuming UDP preserves message boundaries (in reverse)

It does — each recvfrom returns exactly one datagram. But sendto must be called with the entire message; if you call sendto twice, those are two datagrams, not one. Engineers used to TCP often try to "append" to a UDP socket, which doesn't make sense.

Not handling partial writes on TCP

send() can return fewer bytes than you asked it to write. The kernel's send buffer might be full. You must loop, advancing your offset. Many languages provide sendall (Python) or net.Conn.Write (Go, which loops internally), but raw C send needs manual handling.

UDP without timeouts

A UDP recv() can wait forever if no datagram ever arrives — your packet may have been lost. Always set SO_RCVTIMEO or use select/epoll. Otherwise your application will hang on the first packet loss.

Ignoring application-layer keepalive

TCP connections can sit idle for hours without either side knowing the other is gone. Without TCP keepalive (off by default) or an application-layer ping, you'll discover a dead connection only on the next write. WebSockets, gRPC, and most modern protocols have explicit keepalive precisely for this.

Buffer-size mismatch

If receiver's read buffer is smaller than sender's write rate, the receive buffer fills, the receiver advertises rwnd=0, the sender stops. The connection isn't broken — it's just stuck. Always size receive buffers to handle the expected throughput × processing latency.

UDP socket without bind on outgoing-only client

A UDP client that calls sendto without bind picks an ephemeral source port automatically — which means every restart picks a different port and any peer using source-port for routing will get confused. Bind to a known port for stable behavior, or accept the ephemeral nature.


Terms used above, defined

RTT: round-trip time. Time for a packet to go from sender to receiver and back. Typical values: localhost 0.1 ms, same-region 1-5 ms, cross-country 30-80 ms, cross-ocean 100-200 ms, satellite 600 ms.

MSS: maximum segment size. The largest payload TCP will send in a single segment. Derived from path MTU (usually 1500) minus IP header (20) minus TCP header (20) = 1460.

MTU: maximum transmission unit. The largest frame the underlying link can carry. 1500 on standard Ethernet; 9000 on jumbo-frame networks; lower on tunnels and some mobile links.

cwnd: congestion window. Sender's estimate of how many bytes can be in flight without overloading the network. Grows on success, shrinks on loss.

rwnd: receive window. Receiver's advertised buffer space, in bytes. Sender must not have more than rwnd bytes unacknowledged in flight.

RTO: retransmission timeout. The timer that fires per-packet to trigger retransmit. Computed from smoothed RTT plus variance.

SACK: selective acknowledgment. TCP option that lets the receiver describe which out-of-order ranges have arrived, so sender can retransmit only the missing pieces.

Nagle's algorithm: TCP's small-write batching. Disable with TCP_NODELAY for latency-sensitive workloads.

Delayed ACK: receiver's ACK batching, waits up to 200ms to piggyback on an outgoing data segment. Disable with TCP_QUICKACK on Linux.

SYN cookie: defense against SYN flood. Server encodes the connection state in the SYN-ACK rather than allocating a TCB.

TFO: TCP Fast Open. Carries application data in the SYN on subsequent connections, saving one RTT.

BBR: Google's bottleneck-bandwidth congestion-control algorithm. Estimates pipe capacity directly rather than using loss as the only signal.

CUBIC: default Linux congestion-control algorithm. Cubic growth function from last-loss time.

Reno, NewReno: older TCP congestion control. Linear growth, halve-on-loss.

SCTP: Stream Control Transmission Protocol. RFC 4960. Combines TCP's reliability with multi-streaming. Used inside WebRTC data channels and SS7 over IP.

DCCP: Datagram Congestion Control Protocol. RFC 4340. UDP with congestion control. Rarely deployed.

QUIC: RFC 9000. Transport protocol on UDP that provides TCP-like reliability per stream, plus integrated TLS.

DTLS: Datagram TLS. RFC 6347. Adds TLS to UDP without requiring stream semantics.

Datagram: a self-contained packet with header and payload, delivered as a unit or not at all.

Byte stream: an arbitrary-length sequence of bytes with no inherent message boundaries. TCP delivers a byte stream.

Head-of-line blocking: when one slow item blocks all items behind it. TCP suffers this for application streams multiplexed over one connection (HTTP/2 problem; HTTP/3 fix).

Bandwidth-delay product: bandwidth × RTT. The amount of data that can be in flight on a link at once. Determines minimum useful window size.

TIME_WAIT: TCP state after close, held for 2×MSL to absorb late packets. Causes port exhaustion on busy short-connection services.

Path MTU discovery: process of finding the smallest MTU along a path. Modern stacks do PLPMTUD (packetization-layer PMTUD) for robustness in the face of ICMP blocking.


Try these on the simulator and in code

1. Set loss to 0% and reorder to 0%. Both protocols complete in roughly the same time, but TCP took an extra 1.5 RTT of handshake. Verify by watching the timing.

2. Set loss to 20% and reorder to 0%. Watch how TCP's completion time grows non-linearly. Each loss costs an RTT-plus for the retransmit; multiple losses interact.

3. Set loss to 0% and reorder to 30%. TCP still delivers everything in order. Watch how out-of-order arrivals at the receiver get buffered until the predecessor catches up. UDP shows the reorders clearly in arrival order.

4. Set loss to 30% and packets to 14. Watch the simulator carefully: TCP's effective throughput drops drastically; UDP delivers a few then loses many. At what loss rate does TCP completion become more than 3x UDP delivery time?

5. Use tc on a Linux box: tc qdisc add dev eth0 root netem loss 5% delay 100ms. Run iperf3 -c to a peer over TCP and UDP. Compare actual throughput numbers to the table in part 8.

6. Open Wireshark, capture a curl https://example.com/. Identify the handshake (3 packets), the TLS handshake (more packets), and the HTTP request/response. Note the time between each. This is where latency-budget pages get their numbers.

7. Write a UDP "ping" client and server in your language of choice. Send 1000 packets at 1ms intervals; report loss rate, RTT mean/p99. Now run it against a server in another region. Surprise: even healthy paths often have 0.1-1% loss.

8. Modify the UDP client to retransmit on timeout (resend if no reply within 200 ms). Now you've reinvented half of TCP. Notice how many edge cases there are — duplicate replies, out-of-order replies, ambiguous RTT samples after retransmit.

9. Read RFC 793 (TCP) and RFC 768 (UDP) side by side. Note that UDP is 3 pages and TCP is 85 pages. Most of TCP's complexity is reliability and congestion control.

10. Build a toy file-transfer tool over UDP that handles reordering, drops, and back-pressure. Compare against scp or rsync (both TCP-based) — at what loss rate does your UDP version win? At what loss rate does it lose to TCP because of fewer optimisations?


If you remember nothing else

1. TCP gives reliable, ordered bytes between two endpoints; UDP gives best-effort datagrams.

2. TCP costs 1.5 RTT of handshake plus more for TLS; UDP starts with the first packet.

3. TCP turns loss into latency (retransmit RTTs); UDP turns loss into missing data.

4. Web, databases, SSH, gRPC live on TCP; DNS, real-time media, gaming, tunnels live on UDP.

5. HTTP/3 puts TCP-like reliability on UDP per stream — best of both for the web.

Internalise those and you can hold your own in almost any networking conversation. The rest is detail.

Found this useful?