9 min read · Guide · Network
How it works · Network · Real-time

WebSockets, a two-way connection that stays open over HTTP

An HTTP request that never ends. One tiny handshake, ten thousand frames, a TCP socket that sometimes outlives the person who opened it. The protocol that made the web bidirectional.

Parts01 – 08 InteractiveFull lifecycle PrereqHTTP / TCP / TLS

What is a WebSocket?

HTTP asks; the server answers.

WebSockets are a bidirectional, full-duplex communication channel over a single TCP connection. RFC 6455 (2011) standardised them. The connection starts as HTTP and upgrades to WebSocket via the Upgrade header; once upgraded, both client and server can send messages at any time. Modern realtime apps (chat, collaboration tools, live dashboards) use WebSockets, with HTTP/3 and Server-Sent Events as alternatives for one-way pushes.

For fifteen years, HTTP was strictly half-duplex: the browser asked, the server answered, and that was the entire conversation. If the server had news — a chat message, a stock tick, the other player’s move — it had no way to say so. It waited to be asked.

The workarounds were increasingly baroque. Polling — hammer the server every second; waste 95 % of the requests. Long polling — hold the request open and answer it the moment something happens; then the client re-opens instantly. Hidden iframes — stream an infinite HTML document and parse <script> tags as they arrived. Each trick traded bandwidth, latency, or complexity for the one thing the protocol wouldn’t give you: a server-initiated message.

WebSockets ended the workaround era. RFC 6455, published in December 2011, defines a way to start a connection as HTTP, talk the server into abandoning HTTP mid-request, and keep the underlying TCP socket open for as long as both sides want. After the handshake, either side can send at any time. The request/response shape is gone.

Before

Request · response · repeat.

Half-duplex. The server has no standing invitation to speak. Every update is the result of a client poll. Thirty requests a minute to check if anything happened; twenty-nine of them are wasted.

After

One socket · both sides speak.

Full-duplex. The handshake happens once; the socket stays open for minutes, hours, days. Either side sends the moment it has something to say. Overhead drops from ~800-byte HTTP headers to ~6-byte frame headers.


The upgrade handshake: enter as HTTP, leave as a WebSocket

Enter as HTTP, leave as something else.

The handshake is not a hack — it is a deliberate piece of engineering theatre. WebSockets live on ports 80 and 443, wearing HTTP clothes, specifically so corporate firewalls that allow "web traffic" allow them. The disguise is load-bearing: the protocol could have defined its own port, as FTP (21) or SSH (22) did. It didn't, because a protocol no firewall permits is a protocol no one can deploy.

Three headers carry the whole conversion. Connection: Upgrade and Upgrade: websocket signal the protocol switch. Sec-WebSocket-Key is a random 16-byte nonce, base64-encoded — and what the server does with it is the entire point.

Client → Server

GET, with upgrade intent.

An ordinary HTTP GET, but with Upgrade: websocket, Connection: Upgrade, Sec-WebSocket-Version: 13 (the only version still in use), and the 16-byte nonce in Sec-WebSocket-Key. Origin and any subprotocols ride along.

Server → Client

101 Switching Protocols.

A rarely-seen 1xx status that means "the next byte on this socket is a different protocol." The server proves it read the nonce by returning Sec-WebSocket-Accept — the SHA-1 of the key concatenated with a magic GUID, base64-encoded.

The magic string

258EAFA5-E914-47DA-95CA-C5AB0DC85B11 — a sixty-character GUID written literally into RFC 6455. It exists so that a confused or malicious HTTP endpoint, echoing the key back blindly, will fail the check. The client knows it is talking to a real WebSocket server and not a proxy that misread the headers. After the blank line, both sides dismantle their HTTP parsers.


The full WebSocket lifecycle: connect, converse, keepalive, close

Connect, converse, keepalive, close.

The simulator below plays the whole lifecycle — the upgrade, a handful of data frames in both directions, a keepalive pair, and the close. Use the step controls to scrub. Click Show wire bytes to see the exact frame layout the network sees.

CLIENTBrowsernew WebSocket(…)SERVERexample.comwss://example.com/chat 01TCP + TLS connect 02 GET /chat · Upgrade: websocket 03 101 Switching Protocols 04 TEXT frame · masked 05 TEXT frame · unmasked 06 BINARY frame · protobuf 07 PING · keepalive 08 PONG · reply 09 CLOSE · status 1000 10 CLOSE · echo, then FIN ONE TCP SOCKET · FOUR PHASES · FULL DUPLEX
Step 01 of 10

The socket opens like any other web request. TCP handshake, then TLS if it is wss://. The WebSocket protocol deliberately inherits HTTP’s plumbing so it passes every firewall that already lets browsers talk to port 443.


WebSocket framing starts at just two bytes

Two bytes of headroom.

Every WebSocket message is one or more frames. The frame header is where almost all of the protocol’s intelligence lives — opcodes, length, masking, fragmentation — crammed into as few as two bytes. Compare to HTTP, where headers routinely hit 800.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 FFINRSV 1·2·33 bitsOPCODE4 bitsMMASKPAYLOAD LEN7 bits · 0-125 = literal, 126 = read 16 more, 127 = read 64 moreEXTENDED PAYLOAD LENGTH16 or 64 bits · only present when neededMASKING KEY32 bits · PRESENT if and only if MASK = 1 (i.e. client → server)PAYLOAD DATAUTF-8 if opcode = 1 (TEXT) · arbitrary bytes if opcode = 2 (BINARY) · XORed with masking key if MASK = 1
Frame header. Minimum 2 bytes, maximum 14; the rest is payload.

Read the header top-to-bottom. FIN is a single bit: set it when this is the last frame of a message (so you can split big messages across frames without them interleaving with other messages). RSV1-3 are reserved for extensions; RSV1 carries permessage-deflate compression when negotiated. Opcode is four bits, which we spell out below. MASK is a single bit: always 1 from client to server, always 0 from server to client. Payload length is the clever part — seven bits that can stand in for 16-bit or 64-bit lengths depending on a sentinel value, so short messages pay for only seven bits of length while long messages can still express eight exabytes.


WebSocket opcodes: seven defined, six ever used

Seven opcodes, six ever used.

The four-bit opcode field has sixteen possible values; RFC 6455 assigns seven and reserves the rest. Three carry application payloads (TEXT, BINARY, CONTINUATION). Three are control frames sent by the protocol itself (CLOSE, PING, PONG). Control frames must fit in a single unfragmented frame of ≤125 bytes — they are never part of a streaming message.

0x0CONTINUATION. Middle or end of a fragmented message. Only legal after a TEXT or BINARY with FIN=0.
0x1TEXT. UTF-8 payload. Receiver must close the connection with status 1007 if the bytes are not valid UTF-8.
0x2BINARY. Opaque bytes — protobuf, MessagePack, raw frames. No encoding contract.
0x3 – 0x7Reserved. For future non-control frames. Unknown opcodes must close the connection.
0x8CLOSE. First two bytes of payload are a status code (1000 = normal, 1001 = going away, 1008 = policy, 1011 = server error). Rest is a UTF-8 reason.
0x9PING. Liveness probe. Payload is echoed back in the PONG. Sent by either side, any time.
0xAPONG. Reply to a PING. May also be sent unsolicited as a unidirectional heartbeat.
0xB – 0xFReserved for future control frames.

Masking: clients always mask, servers never do

Clients mask, servers never do.

Every frame from the client carries a 32-bit masking key. Every byte of the payload is XORed with a rotating byte of that key before it hits the wire. The server reverses the XOR on receipt. The payload’s content is unchanged in net effect — so why bother?

Because of cache-poisoning intermediaries. A sufficiently clever attacker, controlling a browser via JavaScript, could try to craft a WebSocket payload whose bytes look like a valid HTTP GET request to a transparent proxy sitting between the client and the server. The proxy — not knowing WebSockets — might interpret those bytes as a new request, fetch the response, and cache it under a URL the attacker chose. Later users get the poisoned cache entry.

Masking makes this attack impossible. The attacker controls the plaintext but not the random key; the bytes that actually traverse the intermediary are unpredictable per frame. A proxy that decodes them as HTTP sees gibberish. Server-to-client frames are never masked because there are no dangerous intermediaries in that direction — the server does not live behind the victim’s corporate proxy.

Cost

Masking is pure CPU — one XOR per byte on each side. On modern hardware this is negligible; on very constrained embedded clients it is not. RFC 6455 considered this carefully and decided defense-in-depth beat saving a few cycles.


Scaling WebSockets from one box to fifty

A long-lived socket ties each user to one server.

A stateless HTTP API is embarrassingly easy to scale. Put fifty machines behind a load balancer; any request goes to any machine; if one dies, another answers in milliseconds. WebSockets subvert every assumption in that sentence.

Because the socket lives for minutes or hours, a user is physically tethered to one server — call it Server A — for the duration of the session. The load balancer cannot round-robin individual frames. If Server A restarts, every connected user is violently disconnected and has to reconnect, potentially to a different server. The state that server held — room membership, typing indicators, subscription filters — is gone unless it was written somewhere durable.

This creates two separate scaling problems. The first is the C10K problem — how many idle sockets can one box hold? In 1999 the answer was about ten thousand, limited by the threads-per-connection model. Modern runtimes — Node, Go, Rust’s Tokio, Java’s Netty — hold each socket as a few hundred bytes of epoll state, and a single machine can comfortably serve a million. The second is fan-out: when a message arrives for a user connected to Server A, and someone else typing on Server B wants to send it, how does it get there?

PUB/SUB

Redis, NATS, Kafka.

Every server subscribes to a common broker. Server B publishes the message to a topic; every server with a subscriber in that topic gets a copy; each server hands the message to the right socket. Fan-out is decoupled from the connection graph.

STICKY LB

Affinity on the balancer.

The load balancer pins a user to the same backend on reconnect (cookie or IP hash). This keeps session-local state warm but does not solve the cross-server problem — you still need pub/sub for messages that cross users.

DRAIN

Graceful restart.

Deploys cannot just kill the process — they must refuse new connections, send CLOSE frames with status 1001 "going away", wait for clients to reconnect elsewhere, then exit. Rolling deploys take minutes instead of seconds.


Cross-Site WebSocket Hijacking and security

Cross-Site WebSocket Hijacking.

The handshake is a plain HTTP GET. Browsers attach cookies to HTTP GETs — same-origin or not, unless SameSite says otherwise. The implication: a malicious page at evil.example can execute new WebSocket("wss://yourbank.com/api") in a background script; the browser cheerfully sends the user’s authenticated session cookie along with the upgrade request; the bank’s server accepts; the attacker now has a live, authenticated WebSocket to someone else’s bank session.

This is Cross-Site WebSocket Hijacking — CSWSH — and it is the CSRF of the real-time era. The twist that matters: CORS does not apply to WebSockets. The browser will not block the upgrade on Origin mismatch the way it would block a cross-origin fetch. The defense has to live on the server.

  1. Origin

    Validate it yourself.

    Parse the Origin header during the upgrade request and reject any value not on an explicit allow-list. Refuse the handshake with 403. Never echo the client’s Origin back as the allowed origin — maintain a server-side allow-list.

  2. SameSite

    Set cookies correctly.

    SameSite=Lax blocks most cross-site attachment. SameSite=Strict blocks it entirely. Either dramatically reduces the blast radius — the browser refuses to send the session cookie with the upgrade request in the first place.

  3. CSRF token

    One-time upgrade ticket.

    For high-sensitivity endpoints, require a server-issued token (fetched via a separate authenticated request) to be passed as a query parameter in the WS URL. Cookies alone are insufficient; the attacker must also prove they can read a response from your origin.

  4. wss://

    Always TLS.

    ws:// is cleartext — every intermediary sees every frame. wss:// wraps the whole thing in TLS 1.2+, on port 443, so the handshake and every subsequent frame are encrypted. In production, never use ws://.

WebSockets at scale: connections per node and the load balancer

How real services handle a million sockets.

One node holds many sockets. Modern Linux easily handles 1M+ idle TCP sockets per box (with ulimits and sysctl tuned). The harder problem is the application: each socket holds buffer memory; the application must process inbound frames; per-connection ping/pong eats CPU. Real numbers from production: Phoenix LiveView (Elixir) reports ~2M concurrent connections per node; Go services ~1M; Node.js ~300-500k.

Load-balancer stickiness. WebSockets are long-lived and stateful — the same client must keep hitting the same server. L7 load balancers (NGINX, Envoy, HAProxy) handle this with cookie-based or hash-based affinity. AWS ALB and GCP LB both support WebSocket out of the box; sticky sessions need to be enabled explicitly.

Reconnect storms. A node failure or deploy causes every connected client to reconnect at once. Without staggered backoff, the surviving nodes get a stampede. Production systems implement exponential backoff with jitter on the client (pause 1s, then 2s, then 4s with ±50% jitter). Slack's Flannel and Discord's gateway both ship documented stagger strategies.

Cross-node fan-out. When user A on node 1 sends a message to user B on node 17, the application needs a pub/sub layer. Common patterns: Redis Pub/Sub (simplest, lossy under load), Kafka (durable, higher latency), NATS (low-latency in-memory pub/sub). Choose by whether messages must survive a node restart.



A closing note

WebSockets are the protocol equivalent of a back door — the web was built for request/response, and this is the compromise by which we pretended otherwise. It works, and works well, because the compromise was engineered precisely: the handshake is honest HTTP, the frame format is ruthless about overhead, the security choices were opinionated, the fan-out problem got punted to pub/sub. If you need full-duplex real-time over the web, there are three reasonable answers: Server-Sent Events when the server does all the talking, WebRTC when you need peer-to-peer, and this — WebSockets — when both sides need to speak at low overhead. The protocol is boring; boring, in protocols, is the highest compliment.

Related Design a realtime chat WebSocket
Found this useful?