07 / 12

Stack / 07

TLS

TLS gives you a confidential, authenticated channel over an unauthenticated network. Two parties agree on a cipher suite, exchange certificates, derive a shared secret, and start encrypting. The 1.3 handshake takes one round-trip; the 1.2 handshake took two. Most of the complexity people associate with TLS — handshake debugging, certificate chains, mTLS — is much easier once you've watched a single handshake byte by byte.

The 1.3 handshake, in one round trip

TLS 1.3 (RFC 8446, August 2018) cut the handshake from two round-trips to one by being more aggressive about what each side can guess. Everything here runs on top of a TCP connection that is already established — the three-way handshake completes first, then TLS begins, which is why a fresh HTTPS connection costs TCP setup plus the TLS handshake before any request goes out.

Client                                          Server
  | --- ClientHello ---------------------------> |
  |     · supported cipher suites                |
  |     · supported groups (curves)              |
  |     · key_share for guessed group(s)         |
  |     · extensions: SNI, ALPN, ...             |
  |                                              |
  | <-- ServerHello ---------------------------- |
  |     · selected cipher suite                  |
  |     · key_share (server's half)              |
  | <-- {EncryptedExtensions} ------------------ |
  | <-- {Certificate} -------------------------- |
  | <-- {CertificateVerify} -------------------- |
  | <-- {Finished} ----------------------------- |
  |                                              |
  | --- {Finished} -----------------------------> |
  | === application data === ===================> |
  | <==                          === application |

Everything in {braces} is encrypted with handshake keys derived from the key_shares. After a single round-trip, both sides have a shared secret, the server's identity is verified, and application data can flow.

The timeline below shows where the round-trip cost actually sits. TLS 1.2 needed two round-trips before the first byte of application data because the server could not start deriving keys until it had seen the client's key exchange, which arrived in a second flight. TLS 1.3 has the client guess the group and send its key_share in the very first message, so the server can finish its half of the key agreement immediately and send its certificate already encrypted. One trip up, one trip back, then data.

The TLS 1.3 handshake. One trip up, one trip back, then encrypted data — the second round-trip that TLS 1.2 spent on key exchange is gone.

What got dropped from 1.2: RSA key exchange (no forward secrecy), CBC modes, compression, renegotiation, weak hash functions. The result is a smaller, safer protocol that's faster on the wire too.

SNI and ALPN

Two extensions in the ClientHello carry information the server needs before it can respond:

SNI (Server Name Indication, RFC 6066). "I'm trying to connect to api.example.com." The server uses this to pick which certificate to present. Without SNI, you can't host multiple certificates on a single IP.
ALPN (Application-Layer Protocol Negotiation, RFC 7301). "I support h2 and http/1.1." The server picks one. This is how a single TLS endpoint can serve both HTTP/1.1 and HTTP/2 — the client tells the server which protocols it supports during the handshake, the server chooses, and the application layer matches.

SNI is sent in the clear in TLS 1.3, which means a passive observer can see which hostname you're connecting to. Encrypted Client Hello (ECH, RFC 9849) addresses this; it's implemented in Cloudflare's edge and Firefox but not yet universal. The SNI guide covers the wire format, SNI routing, and ECH end to end.

The certificate chain

A certificate is a signed statement: "this public key belongs to this name". The signature is from a CA whose public key the client already trusts. Most production certificates are signed by an intermediate CA that's signed by a root CA in the client's trust store, which means the chain has three certificates.

leaf:         api.example.com         ← signed by intermediate
intermediate: Example CA Inc.         ← signed by root
root:         ISRG Root X1            ← in the trust store

The server sends the leaf and the intermediate (modern best practice). The client walks the chain: leaf signature must match the intermediate's public key, intermediate's signature must match the root's public key, and the root must be in the trust store.

The chain of trust. The client only trusts the root directly; everything else is trusted because a trusted certificate signed it.

Two operational details:

Forgetting to send the intermediate is the #1 production TLS bug. Browsers' "AIA fetching" papers over this for HTTPS in browsers; most other clients (curl with strict checks, Java, Go) don't, and you get "unable to verify the first certificate".
Cross-signed roots and intermediates. Let's Encrypt's chain has been cross-signed multiple times to maintain trust on older devices. The 2021 Let's Encrypt chain swap broke a lot of OpenSSL 1.0.2 clients; if your runtime is more than a few years old, watch your CA's announcements.

Mutual TLS

In a normal handshake, the server proves its identity to the client. With mutual TLS, the client also presents a certificate signed by a CA the server trusts. After the handshake, both sides know who they're talking to.

mTLS is excellent for service-to-service authentication inside a controlled environment — service meshes, internal APIs, anywhere you can run a private CA. The cost is operational: you need a way to issue, rotate, and revoke client certs as fast as your fleet changes. SPIFFE/SPIRE and AWS Private CA exist for this.

It scales poorly for public APIs because every caller has to enrol a cert. For server callers you control, the security wins (no bearer tokens to leak, no token refresh logic, identity bound to the host) usually justify the operational cost.

Session resumption and 0-RTT

After a successful handshake, the server can give the client a "session ticket" — essentially a serialised, encrypted note saying "this client and I have already done a handshake, here's the resulting key material". On the next connection, the client sends the ticket back and both sides skip most of the handshake.

With session resumption alone, TLS 1.3 can do a 1-RTT handshake using the cached key. With 0-RTT, the client can include application data in the very first ClientHello, encrypted with the resumed key. The server can act on that data immediately, before any round-trip.

The 0-RTT replay risk. An attacker who captures a 0-RTT request can replay it; the server has no way to tell the second copy from the first. This is fine for idempotent reads (GET /api/me) and disastrous for non-idempotent writes (POST /api/charge). Servers and reverse proxies need to be configured to only accept 0-RTT for safe methods, or to require an idempotency key. RFC 8470 has the gory details.

Cipher suites in 1.3

TLS 1.3 trimmed the cipher suite catalogue from hundreds of combinations to five:

Cipher suite	Notes
TLS_AES_128_GCM_SHA256	Default. Hardware-accelerated on every x86 since Westmere.
TLS_AES_256_GCM_SHA384	256-bit AES; marginally slower; useful only for very long-lived secrets.
TLS_CHACHA20_POLY1305_SHA256	Better on devices without AES hardware (older mobile). The default for Cloudflare on mobile.
TLS_AES_128_CCM_SHA256	For constrained devices (IoT) without GCM.
TLS_AES_128_CCM_8_SHA256	Same, with shorter authentication tag.

Each is "AEAD + key length + hash". The handshake separately negotiates the key exchange group (X25519 or P-256 in practice).

Post-quantum, briefly

Sufficiently large quantum computers can break the elliptic-curve key exchange every modern TLS handshake uses. They don't exist yet. The risk that does matter is "harvest now, decrypt later" — a recorded handshake from today could be decrypted in the 2030s if the math works out.

The current state: hybrid key exchange. The handshake uses X25519 and a post-quantum KEM (ML-KEM, formerly Kyber) and combines the two outputs. If either holds up, the connection holds up. Chrome enabled X25519MLKEM768 by default in 2024; Cloudflare and Google support it on the server side. If you're running a modern proxy and a recent Chrome, you're already negotiating it on the wire.

Tools — what to reach for

Tool	Use for
`openssl s_client -connect host:443 -showcerts`	The chain, the cipher suite, the protocol version. The first thing to run.
`openssl x509 -in cert.pem -text -noout`	Read a single certificate.
`curl -v https://host`	Real-world client behaviour with full handshake logging.
SSL Labs server test (ssllabs.com)	Detailed grade and recommendations for any public endpoint.
Wireshark with the SSLKEYLOGFILE	Decrypt your own traffic for debugging. Set `SSLKEYLOGFILE=...` in your client; tell Wireshark where it is.
`bpftrace -e 'usdt:libssl:rsa-private-decrypt { ... }'`	Trace inside OpenSSL when something is taking forever.

How keys actually get derived — HKDF and the key schedule

The handshake's job is to agree on a shared secret without ever sending it. TLS 1.3 uses ECDHE (Elliptic-Curve Diffie-Hellman Ephemeral) over X25519 or one of the NIST curves. Client and server each generate a fresh keypair for the connection, exchange public keys in ClientHello / ServerHello, and each derive the same shared secret from their private key + the other's public key. The "ephemeral" matters — the keys exist only for this connection, so even a future leak of the long-term certificate private key cannot retroactively decrypt past traffic. That property is called forward secrecy and was the single biggest correctness win moving from TLS 1.2 to 1.3.

From the shared secret, TLS uses HKDF (HMAC-based Key Derivation Function) to derive a cascade of keys: early-traffic secret, handshake-traffic secret, application-traffic secret, exporter secret. Each is derived from the previous plus a label and the handshake transcript hash. The "key schedule" in RFC 8446 §7.1 spells out exactly which input goes through which HKDF-Expand-Label call. The structure looks like overkill until you see what it buys 1.3: each key has a precise scope, and leaking one does not compromise the others. The primitives underneath — HMAC, the hash function, the AEAD cipher — are the same building blocks covered in the crypto engineer's handbook; TLS is the most-deployed assembly of them in the world.

Key schedule (simplified):

PSK or 0  → HKDF-Extract → Early Secret
                              ├── binder_key
                              ├── client_early_traffic_secret
                              └── early_exporter_secret

DH shared → HKDF-Extract → Handshake Secret
                              ├── client_handshake_traffic_secret
                              └── server_handshake_traffic_secret

  0       → HKDF-Extract → Master Secret
                              ├── client_application_traffic_secret
                              ├── server_application_traffic_secret
                              ├── exporter_secret
                              └── resumption_master_secret

After the handshake, the application-traffic secrets are used (with another HKDF expand) to produce the AEAD key and IV for record-layer encryption. Records are encrypted with AES-GCM, ChaCha20-Poly1305, or AES-CCM depending on negotiated cipher. Each record has a fresh nonce derived from the IV + sequence number, so nonce reuse is impossible by construction.

Record-layer mechanics — what goes on the wire

Above the handshake, the TLS record layer is what application data actually travels in. Records are encrypted, MAC'd, and capped at 2^14 bytes (16 KB) of plaintext per record by the spec — though most implementations use smaller records (4-8 KB) to keep buffering latency down.

The plaintext that goes into AEAD encryption is: the application data, plus a 1-byte content type marker (handshake, application data, alert, change_cipher_spec). The ciphertext that comes out is the same length plus a 16-byte authentication tag. The ciphertext is sent prefixed with a 5-byte TLS record header (type, version, length).

One AEAD record. The tag both authenticates and integrity-protects; a single bit flip in the ciphertext makes the tag fail and the record is rejected.

Three operational properties of the record layer that matter in practice:

Record size affects latency. A web server sending 100 KB of HTML in 16 KB records writes 7 records; the client must receive each completely before decrypting. Smaller records (say, 4 KB) let the browser start parsing earlier at the cost of more per-record overhead. Nginx's ssl_buffer_size tunes this.

Coalescing matters more than it looks. TLS records on the wire often get coalesced with subsequent records into a single TCP segment by the kernel. This is fine for throughput but can hurt latency for streaming protocols if the kernel waits to fill the segment. Disabling Nagle (TCP_NODELAY) on the socket is the usual fix.

Each record is independently encrypted. The receiver can drop a corrupt record and keep decrypting the rest. Stream ciphers in TLS 1.0/1.1 couldn't do this; one bit flip wedged the connection. The 1.3 record layer shrugs off network corruption far better.

Performance — what TLS actually costs in 2026

TLS overhead has dropped enormously over the last decade. The 2025-2026 numbers worth knowing are these.

CPU cost of the handshake: ~0.5-2ms on a modern server core for ECDHE over X25519 + RSA-2048 cert signature. EC-only certificates (ECDSA P-256) drop this to ~0.3-1ms. The signature verification on the client is similar. Most of the cost is the elliptic-curve point multiplications.

CPU cost of bulk encryption: AES-NI (the AES-GCM instruction set on x86 since Westmere, on ARMv8 since 2013) makes encryption nearly free — typically 2-5 GB/s per core. Without AES-NI (very old hardware or some embedded platforms), ChaCha20-Poly1305 is the faster software fallback at ~1-2 GB/s.

Latency cost of the handshake: One round trip in TLS 1.3 (down from two in 1.2). For a cross-region connection (100ms RTT), that means the first byte of application data arrives 100ms after TCP is up. 0-RTT cuts this to zero for resumed sessions, at the cost of replay risk on the first request.

The Google study, often cited: in 2010 Adam Langley reported that TLS at scale on Gmail accounted for less than 1% of CPU and less than 2% of network bandwidth. The numbers in 2026 are even lower because of widespread AES-NI and the 1.3 simplifications. The "TLS is expensive" framing has been wrong for over a decade. The right answer to "should we use TLS" is yes, always.

Kernel TLS (kTLS) for static content. Linux 4.13+ supports offloading the record-layer encryption to the kernel after handshake. Files served via sendfile() are encrypted in-kernel with zero copies into user space. Nginx, HAProxy, Netflix's edge servers all use kTLS for large-file serving. Throughput improvement is dramatic — Netflix reported up to 90% CPU reduction on TLS-encrypted video streaming after kTLS rollout.

Revocation — the unsolved problem

What happens when a certificate's private key leaks before the certificate expires? In theory the certificate authority publishes a Certificate Revocation List (CRL) and clients check it before accepting. In practice, this is one of the messiest corners of the TLS ecosystem.

CRLs. The original mechanism. CA publishes a list of revoked certs; clients download it. CRLs grow to megabytes for major CAs; clients cache aggressively; recently-revoked certs may not appear in the cache for hours. Effectively dead for browser TLS.

OCSP (Online Certificate Status Protocol). Client queries the CA's OCSP responder per-cert. Hits performance (an extra DNS+TCP+HTTPS round-trip per TLS connection) and privacy (the CA learns which sites the client visits). Browsers mostly soft-fail OCSP — if the responder is unreachable, accept the cert anyway — which makes revocation largely toothless.

OCSP Stapling. The server periodically fetches its own OCSP response from the CA and includes it ("staples" it) in the TLS handshake. Solves the performance and privacy problems. The right answer for any server you operate; Cloudflare, Let's Encrypt, all major CDNs ship stapled OCSP by default.

CRLite and CRLSets. Browser-vendor-curated compressed revocation lists distributed via the browser update channel. Mozilla's CRLite, Chrome's CRLSets and the newer CRLite-equivalent. Effective for end-user browsers; not relevant for server-to-server TLS.

Short-lived certificates. If certs are valid for 90 days (Let's Encrypt) or 7 days (SPIFFE / SPIRE), revocation becomes mostly unnecessary — by the time a leaked key matters, the cert has expired. This is the direction the industry is moving. The CAB Forum vote to lower max cert lifetime to 47 days (effective March 2029) is part of this trajectory.

Certificate Transparency — the audit log

Certificate Transparency (CT, RFC 6962) is an append-only public log of every certificate issued by participating CAs. Every cert a browser will accept must be logged in at least two CT logs and the certificate must carry a Signed Certificate Timestamp (SCT) proving inclusion.

Why this matters: a CA that misissues a certificate for your domain (whether by malice or compromise) can't do it secretly. The misissuance shows up in a public log within minutes; tools like Cert Spotter, crt.sh, and Censys index the logs and let you watch for certificates issued under your domain. Several major misissuance incidents (Symantec 2017, Digicert 2024) were caught this way.

For server operators, the practical advice is to set up CT monitoring on your domains — free services like Cert Spotter email you whenever a new cert is issued for a name you care about. Catches both misissuance and the "someone in marketing ordered a cert from a different CA without telling us" case.

TLS in the service mesh — short-lived certs at scale

Service-to-service TLS in 2026 looks nothing like browser TLS. The dominant pattern goes like this.

SPIFFE identities. Each workload gets a SPIFFE ID (a URI like spiffe://example.com/ns/prod/sa/api-server) that uniquely names it within the cluster. SPIRE — the reference SPIFFE implementation — runs as a daemon on each node and issues short-lived X.509 certificates (1 hour TTL is typical) to each workload.

Sidecar-terminated mTLS. The application doesn't speak TLS itself. The Envoy sidecar (Istio, Linkerd, Consul Connect) handles TLS on both ingress and egress. The application makes plain HTTP calls to localhost; the sidecar wraps them in mTLS with the SPIFFE-issued cert.

Mesh-wide PKI. The service mesh has its own internal CA (Citadel in old Istio, Istio Pilot in newer versions, the Linkerd identity controller, the SPIRE server). Cert lifecycle, rotation, revocation are all handled within the mesh. Browser-style PKI is irrelevant inside the cluster.

The win: every internal connection is authenticated and encrypted automatically, with automatic rotation, without application code touching TLS. The cost: running the mesh's PKI is a substantial job of its own. Read the linkerd-identity and istio-security docs once if you are going to live with this.

Common mistakes

Disabling certificate verification "to make it work". Eventually goes to production. Eventually breaks security. The right answer is always to fix the trust store or send the right intermediate.
Pinning a certificate (not a key) and getting locked out on rotation. Public-key pinning the spec way (HPKP) is dead; if you're pinning, pin the public-key hash, never the cert hash, and have a backup pin.
Forgetting SNI in non-browser clients. Older Java and some embedded HTTP libraries default to no SNI. The server returns the wrong cert; the client fails verification.
Accepting 0-RTT data on non-idempotent endpoints. Replay attack, duplicate charges. Configure your reverse proxy to only allow 0-RTT for GET / HEAD or to require an Idempotency-Key.

TLS

The 1.3 handshake, in one round trip

SNI and ALPN

The certificate chain

Mutual TLS

Session resumption and 0-RTT

Cipher suites in 1.3

Post-quantum, briefly

Tools — what to reach for

How keys actually get derived — HKDF and the key schedule

Record-layer mechanics — what goes on the wire

Performance — what TLS actually costs in 2026

Revocation — the unsolved problem

Certificate Transparency — the audit log

TLS in the service mesh — short-lived certs at scale

Common mistakes

Further reading