06 / 12
Stack / 06

QUIC and HTTP/3

QUIC is the transport protocol that took the IETF roughly five years to standardise and the wider industry about twenty years to want. It runs on top of UDP, ships its own loss recovery and congestion control, bakes TLS 1.3 into the handshake, and gives applications independent streams instead of one big byte queue. HTTP/3 is the HTTP semantics layer that sits on top. Together they cut the time to first byte on a fresh connection from two round-trips to one — and to zero on the second visit — while sidestepping the head-of-line blocking that has haunted HTTP/2 since 2015.


Why QUIC exists — the head-of-line problem

TCP delivers an ordered byte stream. The kernel hands bytes up to the application in the order they were sent, no exceptions. When a packet is lost, every byte that arrived after it sits in the receive buffer and waits until the retransmission lands and fills the gap. The application doesn’t see the packetisation; it sees a stall. This is head-of-line blocking, and it’s fundamental — you can’t fix it without changing the transport.

HTTP/1.1 hid the problem by opening six TCP connections per origin. A loss on one only stalled the request riding on it. HTTP/2, shipped in 2015, tried to do better: one connection, many multiplexed streams interleaved with HTTP/2 frames. Smaller memory footprint, fewer handshakes, better header compression. But the multiplexed streams still ride a single TCP byte stream — and one lost segment stalls every stream behind it. On a clean fibre link the difference is invisible. On a 1% loss mobile link, with a page that pulls 80 small subresources, it’s painfully visible.

Why did it take twenty years to fix? Because the obvious solution — change TCP — runs into a wall. TCP lives in the kernel. Middleboxes (NATs, firewalls, traffic shapers) inspect TCP options and drop packets they don’t recognise. A new TCP extension can take a decade to deploy widely enough to be useful; even MPTCP, standardised in 2013, is still rare in practice in 2026. QUIC sidesteps the whole problem by riding inside UDP, where middleboxes only see opaque datagrams, and by living in userspace, where every application can ship a new version with its next release.

Why this matters. Head-of-line blocking is the reason a single lost packet on a 4G connection can stall a whole HTTP/2 page load by 100 ms or more. QUIC turns that 100 ms stall into one stalled stream — usually invisible, because the browser can render the partial page from the streams that did arrive.

QUIC's two big ideas

Strip QUIC down and there are two real innovations underneath, both pragmatic responses to twenty years of lessons from HTTP-over-TCP.

The first is independent streams. A QUIC connection carries many streams at once, each with its own ordering and flow control. A lost packet only stalls the streams whose bytes it carried; everything else keeps delivering. Stream 0 can be 80% delivered while stream 4 is blocked on a retransmission, and the receiving application can act on stream 0’s data right away. This is the feature HTTP/2 wanted but couldn’t have, because HTTP/2 was bolted onto TCP.

The second is the integrated TLS 1.3 handshake. Classic TCP+TLS is a chain: one round trip to establish TCP, then one or two more for the TLS handshake, then your first HTTP byte goes on the wire. That’s 2–3 RTTs of dead air before the application sees anything. QUIC merges the two — the very first packet a client sends carries both transport setup and the TLS ClientHello, encrypted with keys derived from the server’s Connection ID. By the end of one round trip you have a working transport and a working encrypted session. On a return visit, a saved TLS session ticket lets you skip even that one RTT.

Everything else in QUIC — connection migration, packet number spaces, the specific frame types — falls out of these two choices. Independent streams need a richer frame format than TCP’s flat segment header. The integrated handshake forces packet number spaces, because you have to send and acknowledge packets before you’ve negotiated app-data keys.

The QUIC handshake — one RTT, often zero

QUIC frames its packets into three protection levels, called packet number spaces: Initial, Handshake, and Application data. Each space has its own keys and its own monotonically-increasing packet number. The handshake interleaves them: the client sends an Initial packet carrying a TLS ClientHello; the server responds with an Initial (ServerHello) and a Handshake packet (encrypted Extensions, Certificate, Finished); the client completes with a Handshake (Finished) and immediately starts sending 1-RTT (application-data) packets.

Compared to the legacy stack: TCP costs you one RTT for SYN / SYN-ACK / ACK, then TLS 1.3 costs one more for ClientHello / ServerHello+Finished / Finished, then your first HTTP byte. Two RTTs of dead air, and the second one only began once the first finished. QUIC does both in one round trip. On a 50 ms path that’s 50 ms saved on every cold connection — small per request, large in aggregate across a page that touches multiple hostnames.

time to first byte, three handshakesTCP + TLS 1.3SYN ↔ SYN-ACK ↔ ACKRTT 1 — TCPClientHello ↔ ServerHello+Fin ↔ FinRTT 2 — TLS 1.3▶ first byteQUIC 1-RTTInitial+ClientHello ↔ Initial+ServerHello+Hsk ↔ Hsk-FinRTT 1 — combined▶ first byteQUIC 0-RTTInitial + 0-RTT app data (with resumption ticket)RTT 0 — first byte rides the first packet01 RTT2 RTT3 RTT

0-RTT works by remembering. After a successful 1-RTT handshake, the server sends a TLS session ticket. The client stashes it, and on the next connection sends an Initial packet that already carries application data encrypted with keys derived from the ticket. The server can read the request and start sending the response before it has finished the handshake.

The 0-RTT caveat. 0-RTT data is replayable. A network attacker who captures a 0-RTT request can resend it later, and the server can’t tell the replay from a fresh request. RFC 9001 limits 0-RTT data to idempotent operations — typically GETs. POSTs, charges, and anything with side effects shouldn’t ride 0-RTT, and most HTTP/3 stacks enforce this by default.

Streams — independent byte streams in one connection

A QUIC stream is a lightweight, ordered byte sequence with its own flow control. Inside QUIC packets, application data rides in STREAM frames; each frame carries a stream ID, an offset into that stream, and a chunk of bytes. The receiver reassembles them in order per stream. If a packet is lost, only the streams whose STREAM frames were in it are blocked; everything else delivers normally.

Stream IDs are 62-bit integers with structure baked into the low two bits. Bit 0 says client-initiated (0) or server-initiated (1). Bit 1 says bidirectional (0) or unidirectional (1). So stream IDs 0, 4, 8, 12… are client-initiated bidi; 1, 5, 9… server-initiated bidi; 2, 6, 10… client-initiated uni; 3, 7, 11… server-initiated uni. The structure means either endpoint can open a new stream without coordinating IDs.

QUIC packet payload (decrypted):

+------------------+--------------------+--------+------------+
| frame type=0x08  | stream id (varint) | offset | length     |
| (STREAM, with    | e.g. 0  (client    | 0      | 312        |
|  off+len flags)  |  bidi stream 0)    |        |            |
+------------------+--------------------+--------+------------+
| ... 312 bytes of stream 0 application data ...              |
+-------------------------------------------------------------+
| frame type=0x08  | stream id          | offset | length     |
| (STREAM)         | 4 (client bidi 4)  | 0      | 89         |
+------------------+--------------------+--------+------------+
| ... 89 bytes of stream 4 application data ...               |
+-------------------------------------------------------------+

One packet can carry frames from many streams. One lost packet
only stalls those specific streams until the bytes are resent.

Flow control is per-stream and per-connection — the receiver tells the sender both "stream X can accept up to byte Y" and "the whole connection can accept up to byte Z". Both are advertised with MAX_STREAM_DATA and MAX_DATA frames, both can be updated mid-flight, and both do the same job as TCP’s window: keep the sender from drowning the receiver.

Packet number spaces

TCP uses one sequence number space for the whole connection. QUIC uses three, one per protection level: Initial, Handshake, and Application data. Each space has its own keys, its own packet number counter starting at zero, and its own ack-eliciting threshold. The split exists because the handshake itself has to send and acknowledge packets before the application-data keys are available, and mixing them in one sequence space would either leak handshake structure or force re-keying gymnastics.

Packet numbers are monotonically increasing within a space and never reused, even across retransmissions. If you send packet 42 and it gets lost, the retransmission rides packet 67 (or whatever the current value is) carrying the same frames. That kills the TCP ambiguity where the receiver can’t tell whether an ACK is for the original send or the retransmission — QUIC’s loss recovery gets real RTT samples on retransmissions, which TCP cannot.

On the wire, the packet number is encoded truncated to one, two, three, or four bytes. The receiver reconstructs the full value from the highest packet number it has already acknowledged in that space. This saves header bytes — most packets carry one or two — without losing the monotonicity property. Header protection (a XOR mask derived from a sample of the packet payload) then hides the packet number itself from passive observers, so middleboxes can’t learn anything from it.

Why monotonicity matters. Reusing a packet number with a different payload would let an attacker who captured the original learn the plaintext difference. QUIC’s AEAD construction uses the packet number as a nonce input, so the never-reuse rule is a hard security invariant, not a performance choice.

Connection IDs and connection migration

TCP pins a connection to a five-tuple — protocol, source IP, source port, destination IP, destination port. Change any of those and the connection is gone. That’s why your TCP-based video call drops when your phone switches from wifi to LTE: the source IP and source port change, the server’s kernel doesn’t recognise the new packets as part of your existing connection, the connection breaks, and the app has to reconnect with a fresh TCP+TLS handshake.

QUIC connections are identified by Connection IDs, not by the five-tuple. Each endpoint assigns IDs that the other end uses on its outgoing packets. When your phone hands off to LTE and your source IP changes, the packets still carry the same Connection ID, the server recognises them as part of the existing session, and the connection survives. No re-handshake, no fresh TLS, no dropped call.

connection migration: wifi → LTE, same Connection IDphonewifi: 10.0.0.7:54321CID=4f c2 8a e1 d3 …LTE: 172.16.34.89:43210CID=4f c2 8a e1 d3 … (same CID, new path)server (PATH_CHALLENGE)hand-off mid-call

Migration isn’t free of risk. An attacker who spoofs your IP and sends QUIC packets with your Connection ID could trick the server into aiming a large response at a victim’s address — an amplification attack. QUIC defends with path validation: when packets arrive from a new address, the server sends a PATH_CHALLENGE frame full of random bytes and waits for a PATH_RESPONSE echoing them before it will send much traffic on the new path. Until validation completes, the server is rate-limited to roughly three times the bytes it received from the new address.

In practice, browsers do connection migration silently for HTTP/3, and apps like Chrome hold connections through wifi-to-cellular transitions that would have cost TCP a full reconnect. The benefit is largest for long-lived connections — video calls, gRPC streams, server-sent events — and modest for short HTTP fetches.

Pacing — why QUIC needs to slow down sending

Kernel TCP has had pacing built into the qdisc layer for years. tc-fq (fair queue) spreads sends out evenly so a 100 Mbit/s flow doesn’t fire packets in 1 Gbit/s bursts. Without pacing, congestion controllers like CUBIC or BBR will happily send a whole cwnd of packets back-to-back, drown some downstream buffer, and cause the very loss the controller was trying to avoid.

QUIC runs in userspace over UDP. The kernel has no idea this is a connection, let alone what its congestion window is, so kernel pacing doesn’t apply. The QUIC implementation has to pace itself. The simple way is a token bucket — release one packet every (RTT / cwnd) microseconds — but sendto calls have overhead, and at 10 Gbit/s you’re calling it millions of times a second.

Linux gave QUIC two helpers. SO_TXTIME lets you stamp each outgoing packet with a future send time, and the kernel emits it at that moment — pacing without per-packet syscalls. UDP Generic Segmentation Offload (GSO) lets you pass a 64 KB buffer in one syscall and have the kernel split it into MTU-sized datagrams. Modern QUIC stacks use both, plus io_uring, plus AF_XDP on the highest-throughput servers. Without these, a naive QUIC sender on Linux burns roughly three times the CPU per byte that kernel TCP does. With them, the gap closes to maybe 30%.

The pacing trap. Skipping pacing and relying on cwnd alone causes microbursts — short bursts that fit inside the cwnd but exceed the bottleneck rate. They look fine on average and trigger loss in practice. Any QUIC stack that means to behave well in production paces by default.

QUIC and HTTP/3 — what's the difference

QUIC is the transport: streams, packets, congestion control, loss recovery, integrated TLS. It’s standardised in RFC 9000 (the core), RFC 9001 (TLS binding), and RFC 9002 (loss detection and congestion control). None of those documents mention HTTP. QUIC could carry anything — and it does: DNS-over-QUIC (DoQ, RFC 9250), Microsoft’s SMB-over-QUIC, and gRPC has experimental QUIC transports.

HTTP/3 is the application protocol on top, standardised in RFC 9114. It maps HTTP semantics onto QUIC streams: each request/response pair gets its own bidirectional stream, request and response frames flow inside, and the stream closes when the response is complete. Independent streams mean independent requests — no head-of-line blocking between concurrent fetches, the single biggest source of HTTP/2 disappointment.

HTTP/2’s header compression scheme, HPACK, doesn’t survive the transition. HPACK needs both ends to process header table updates in order — fine with one TCP byte stream, impossible with many independent QUIC streams that might arrive out of order. HTTP/3 ships QPACK instead. It carries dynamic-table updates on a dedicated unidirectional stream and lets header frames reference table entries with a small commit lag, trading a touch of compression efficiency for the freedom to deliver headers out of order.

LayerRFCWhat it handles
QUIC core9000Streams, packets, frames, congestion control framework
QUIC + TLS9001How TLS 1.3 keys derive the QUIC packet protection
QUIC recovery9002Loss detection, congestion controllers, pacing
HTTP/39114HTTP semantics over QUIC streams
QPACK9204Header compression that tolerates out-of-order delivery
DoQ9250DNS over QUIC — same transport, different application

Why userspace transport — the engineering case

TCP lives in the kernel, and that used to be a feature. The kernel can see all flows and arbitrate between them, the network stack runs at high privilege without context-switching to userspace for every packet, and optimisations like TSO and GRO sit close to the NIC. The cost is the deployment cycle. A new TCP feature ships in a Linux kernel release every two years or so, then takes another three to five years to percolate into the LTS kernels that datacenters and devices actually run. ECN took fifteen years. MPTCP, fourteen. TCP Fast Open was approved in 2014 and is still off-by-default on most servers.

Userspace transport flips the deployment problem. Cloudflare’s quiche, Google’s quiche (different project), Meta’s mvfst, Microsoft’s msquic, Apple’s network.framework — all ship as libraries that update with the application. Google deployed BBR2 to YouTube’s QUIC stack in 2018; the same congestion controller took years longer to land in mainline Linux as a TCP option. The next time someone has a better loss recovery algorithm, the QUIC ecosystem can ship it in one release cycle.

The cost is CPU. Naive QUIC on commodity Linux burned roughly three times the CPU per gigabit of kernel TCP, mainly because every packet means a sendto/recvfrom syscall and a userspace crypto operation. The kernel community’s answer has been to push the bottlenecks out: UDP GSO/GRO for batched syscalls, SO_TXTIME for pacing, io_uring for amortised submission, AF_XDP for bypassing the kernel stack entirely. Production QUIC stacks that combine these are now within 20–40% of kernel TCP on CPU and beat it on latency-sensitive workloads.

The hybrid future. Some shops are pushing parts of QUIC back into the kernel — Microsoft’s msquic has a kernel-mode variant for SMB workloads, and there’s active work on Linux kernel QUIC. The userspace-versus-kernel question may not have a single answer; high-throughput servers will pick whichever path costs them less.

Adoption — who runs it in production

Google built the first version, called gQUIC, and turned it on for YouTube and Chrome-to-Google traffic in 2013. By 2017 over a third of Google’s client-to-server traffic was gQUIC. The IETF then spent four years standardising a cleaner version (sometimes called iQUIC) with TLS 1.3 properly integrated instead of gQUIC’s bespoke crypto. RFC 9000 published in May 2021. Google moved its production fleet from gQUIC to IETF QUIC over the next 18 months.

Cloudflare turned on HTTP/3 in late 2019 with their quiche library built in Rust. By 2023 they reported roughly 30% of all Cloudflare web requests were riding HTTP/3 — most from Chrome and Safari clients. Meta uses mvfst in C++ for Facebook and Instagram; Apple shipped HTTP/3 in Safari and the network.framework stack across iOS and macOS; Microsoft built msquic in C and uses it for HTTP/3 in Windows and as the transport for SMB-over-QUIC.

ImplementationLanguageUsed by
Google QUICHEC++Chrome, Google servers (Search, YouTube, Maps)
Cloudflare quicheRustcloudflared, Cloudflare edge, NGINX HTTP/3 module
Meta mvfstC++Facebook, Instagram, WhatsApp
Apple network.frameworkC / SwiftSafari, iOS, macOS, all Apple system traffic
Microsoft msquicCWindows HTTP/3, SMB-over-QUIC, .NET HttpClient
quinnRustThe de-facto Rust ecosystem QUIC library
quic-goGoCaddy HTTP/3, much of Go’s third-party QUIC tooling
ngtcp2Ccurl, nghttp3, embeddable in C/C++ apps

On the client side, Chrome, Firefox, Safari, and Edge all ship HTTP/3 enabled by default. curl can be built with HTTP/3 via the --http3 flag. Server-side adoption lags a little — nginx’s HTTP/3 support stabilised in 2023, Apache’s mod_http3 is newer still — but the major CDNs and any cloud load balancer worth using support it today.

Common mistakes

  • Assuming QUIC works through every middlebox. Some carrier-grade NATs and corporate firewalls block UDP on all ports except 53 (DNS). Browsers handle this gracefully — Chrome retries with HTTP/2 over TCP after one failed QUIC connection attempt — but server operators see "HTTP/3 didn’t help on this network" and the cause is upstream of them.
  • Using 0-RTT for non-idempotent operations. A POST that charges a credit card sent over 0-RTT can be replayed by an on-path attacker. Most HTTP/3 stacks refuse to send non-GET requests on 0-RTT by default, but if you’re using QUIC for a custom protocol you have to enforce this yourself.
  • Blocking UDP at the firewall and expecting HTTP/3 to work. QUIC rides UDP on port 443 by default. If the firewall allows TCP/443 and blocks UDP/443, clients silently downgrade to HTTP/2 and you wonder why the latency improvements never showed up. Check Alt-Svc: h3=":443" negotiation is actually completing.
  • Not pacing your sender. A QUIC implementation that lets its congestion controller fire packets back-to-back will lose performance on bottleneck links. Use SO_TXTIME if you’re on Linux; many userspace QUIC libraries do this for you, but if you wrote the loop, you wrote the bug too.
  • Reusing Connection IDs predictably. A connection ID is a tracking primitive; if it never rotates, an observer can correlate your traffic across networks. RFC 9000 requires both endpoints to maintain a pool of unused CIDs and rotate them, and connection migration in particular should trigger a CID rotation to avoid linking the old and new paths.
  • Treating HTTP/3 enablement as a magic latency button. HTTP/3’s biggest wins are on cold connections, lossy links, and pages with many small subresources. On a warm pool of HTTP/2 connections over fibre, HTTP/3 makes very little difference. Measure on your real traffic before celebrating.

Further reading

Found this useful?