02 / 12

Stack / 02

Bytes on the wire

A network carries one thing: an ordered stream of bytes. No types, no strings, no structs, no message boundaries, just bytes, in the order they were sent. Everything you think of as data is something both sides agreed to read into those bytes. This page is about that agreement: how a number turns into bytes and back, why two machines can read the same bytes and disagree, how you find where one message ends and the next begins, and why text is its own hard problem. Get this layer right and every protocol above it stops feeling abstract; get it wrong and nothing higher up can save you.

The wire is a stream of bytes

Start from the only fact the network guarantees: bytes arrive in the order they were sent. That is the whole contract. A byte is eight bits, a number from 0 to 255, and the medium moves a run of those numbers from one machine to another. It has no idea that some of them spell a port, some hold a timestamp, and some are the letters of your name. The meaning lives entirely in an agreement between the two programs, written down as a protocol. The sender encodes its data into bytes by the rules of that protocol; the receiver decodes the same bytes by the same rules. When the rules match, you get your data back. When they differ by even one byte, you get garbage, and the network reports no error, because as far as it is concerned it delivered exactly what it was given.

This is why two questions sit under every protocol. The first is encoding: given a value in memory — an integer, a string, a record with several fields — what exact sequence of bytes represents it? The second is structure: when the bytes arrive, how does the receiver tell which bytes belong to which field, and where one message stops and the next starts? The rest of this page is those two questions, worked out. Encoding brings in endianness and character sets; structure brings in framing. Both are invisible when they work and maddening when they do not.

Encoding and decoding are mirror operations governed by the protocol. The wire moves bytes and nothing else.

Endianness — what byte order means

Computers store multi-byte numbers in two possible orders. Take the 32-bit number 0x12345678:

Order	Address 0	Address 1	Address 2	Address 3
Big-endian	`12`	`34`	`56`	`78`
Little-endian	`78`	`56`	`34`	`12`

One value, two memory layouts. Read the wrong way, 0x12345678 becomes 0x78563412, a different number entirely.

Big-endian stores the most-significant byte first — the way you’d write the number out by hand. Little-endian stores the least-significant byte first, which is more convenient for the CPU when doing arithmetic on numbers of varying width. Both conventions exist for historical reasons; both work; neither is morally superior.

What matters is that they’re different. If a machine that uses one order writes a 32-bit number to a network packet, and a machine that uses the other order reads it back without conversion, the number is wrong by a permutation of its bytes.

Architecture	Order
x86 / x86-64	Little-endian
Apple Silicon, modern ARM	Little-endian (configurable, in practice always LE)
Older PowerPC, SPARC, MIPS	Big-endian
Network protocols (IP, TCP, UDP)	Big-endian — "network byte order"

The Internet protocols are big-endian by historical accident, from the 1970s, when many of the influential systems (PDP-10, IBM mainframes) were big-endian. It stuck. So today, almost every machine on the network is little-endian internally, and almost every byte they put on the network is big-endian — they convert at the boundary.

Network byte order, host byte order

C and most systems languages give you four functions for the conversion. They’re named after what they do — host to network, network to host, short (16-bit) or long (32-bit):

#include <arpa/inet.h>

uint16_t htons(uint16_t hostshort);   // host  → network, 16-bit
uint32_t htonl(uint32_t hostlong);    // host  → network, 32-bit
uint16_t ntohs(uint16_t netshort);    // network → host,  16-bit
uint32_t ntohl(uint32_t netlong);     // network → host,  32-bit

On a big-endian machine, all four are no-ops — host order already is network order. On a little-endian machine, they swap bytes. Either way, the calling code is portable: write htons(8080) and the right thing happens.

The most common place beginners hit endianness is the port number in a sockaddr. sin.sin_port = 8080 looks correct, but on a little-endian machine you’ve just told the kernel to use port 0x9000 = 36864 because the bytes were stored in the wrong order. The fix is sin.sin_port = htons(8080).

The shortcut for IP addresses. inet_addr("192.168.1.1") and inet_pton(AF_INET, "192.168.1.1", &addr) already return network byte order. You don’t pass them through htonl — they’re already done.

Serializing numbers, strings, and structs

Turning a value in memory into bytes for the wire is called serialization. Numbers are the easy case once endianness is settled. A 32-bit integer is always four bytes; you pick an order, both sides agree, done. The questions you have to answer up front are width and signedness: is this field 1, 2, 4, or 8 bytes, and can it go negative? A protocol spec pins both. Signed integers on the wire almost always use two's complement, the same representation your CPU uses, so minus one in a 32-bit field is ff ff ff ff. Floating-point numbers are sent as their IEEE 754 bit pattern, which is portable across essentially every machine you will meet, though it still carries a byte order.

Strings are where it gets interesting, because a string has no fixed width. The bytes that spell "hello" are five long; the bytes that spell a username could be any length. The receiver reading a stream of bytes has no way to know where the string ends unless the protocol tells it. There are two ways to say so, and they are the same two ways you frame whole messages, which is not a coincidence. You can put the length in front of the string, a length prefix, say a 2-byte count followed by exactly that many bytes, or you can mark the end with a sentinel byte the string itself can never contain, the classic being the C convention of a trailing 0x00. Length-prefixed strings are safer and faster to read because you know up front how many bytes to grab and you can allocate exactly the right buffer. Null-terminated strings are compact but force a scan to find the end and break the moment the data legitimately contains a zero byte.

A struct, a record with several fields, is just those rules applied one field after another, in the exact order the spec lists. The IPv4 header you will walk later is a struct: a 1-byte field, then another 1-byte field, then a 2-byte field, and so on, packed with no gaps. That last word matters. In memory, your compiler is free to insert padding between struct fields so each one lands on a convenient address, and the amount of padding depends on the platform. On the wire there is no padding; fields butt up against each other exactly as the spec draws them. This is the trap behind casting a wire buffer straight onto a C struct pointer: the in-memory layout, with its padding and native byte order, is almost never the wire layout. You serialize field by field, or you use a packed struct and convert byte order explicitly, but you do not assume the two layouts match.

The rule that prevents most bugs. Never send a value without first deciding its width, its signedness, and its byte order, and never read one back without checking you actually have that many bytes. Width, sign, order, length. Four decisions per field, written into the protocol, and serialization stops being mysterious.

The framing problem

Here is the single most common source of real bugs at this layer, and the one most tutorials skip. TCP is a byte stream, not a message stream. When you call send() twice, once with 100 bytes and once with 50, TCP does not promise the other side gets a 100-byte read followed by a 50-byte read. It promises only that the 150 bytes arrive in order. The receiver might get all 150 in one recv(), or 30 then 120, or one byte at a time. TCP coalesces and splits your writes however the network and its buffers see fit. There are no message boundaries on the wire because TCP never put any there.

So if your application has messages, and almost every protocol does, you have to build the boundaries yourself, on top of the stream. That is framing: a rule the receiver uses to carve the incoming byte stream back into the messages the sender meant. There are two classic rules, and every protocol you know picks one or the other.

Length-prefixing puts a fixed-size count at the front of each message. The receiver reads the count first, say 4 bytes giving the body length, then reads exactly that many more bytes, and now it holds one complete message. This is what most binary protocols do, because it is unambiguous and the body can contain any byte at all, including the bytes that would otherwise look like a delimiter. Delimiters instead mark the end of each message with a sentinel sequence the body is guaranteed not to contain, a newline for line-based text protocols, \r\n\r\n for the end of HTTP/1 headers. The receiver reads until it sees the delimiter, and everything before it is one message.

Two framing rules, and the reason both need a buffer: TCP splits and joins your writes, so a frame can straddle reads.

Both rules share a consequence that trips up everyone the first time: you cannot assume one recv() equals one message. Because TCP chops the stream wherever it likes, the length prefix can arrive in one read and the body in the next, or two whole messages can arrive in a single read, or a message can be split straight down the middle. So a correct reader keeps a buffer. It appends whatever each recv() returns to that buffer, then repeatedly checks whether the buffer now holds a complete frame, enough bytes for the declared length, or a delimiter somewhere inside it. When it does, it removes that frame and processes it, leaving any leftover bytes in the buffer as the start of the next message. This buffer-and-check loop is the heart of every protocol reader ever written, and skipping it is the bug behind half of all "works on localhost, fails over the real network" reports, because a fast local loopback often delivers a whole small message in one read and hides the problem.

Each rule has its own pitfalls. A length prefix that is too small to hold the real message size truncates large messages; one that is attacker-controlled lets a peer claim a four-gigabyte body and exhaust your memory, so real servers cap the declared length before allocating. Delimiters need escaping or a guarantee the body cannot contain the marker; a line protocol that forgets to handle a newline inside a value will split one message into two. And a partial read of a delimiter-framed stream forces you to scan the buffer again from where you left off each time more data lands, which is why high-throughput systems lean toward length prefixes. The deeper trade-offs of schemas, versioning, and self-describing formats show up once you reach a real serialization format like Protocol Buffers, which uses length-prefixed fields throughout precisely to dodge these problems.

Reading a hex dump

Every byte-level tool you’ll use — Wireshark, tcpdump, hexdump, xxd — eventually shows you a hex dump. The format is consistent enough to learn once.

0000:  45 00 00 3c 1c 46 40 00 40 06 b1 e6 c0 a8 00 68    E..<.F@.@......h
0010:  ac d9 0e 8e 0d be 01 bb 5b 8b 9d 4c 00 00 00 00    ........[..L....
0020:  a0 02 fa f0 91 7c 00 00 02 04 05 b4 04 02 08 0a    .....|..........
0030:  00 67 fb 21 00 00 00 00 01 03 03 07                .g.!........

Three columns. The leftmost is the offset into the dump (in hex). The middle is 16 bytes per line, in hex, two characters per byte, separated by spaces. The right is the same bytes shown as ASCII where the byte falls in the printable range (0x20–0x7e), with a dot for everything else.

You read it left to right, top to bottom — exactly like text. Offset 0x10 is the byte 16 positions in (counting from zero); offset 0x20 is byte 32; and so on. To find the byte at offset 22, jump to the 0x10 line and count six bytes in: 01.

Walking a real packet

The dump above is a real TCP SYN packet. Let’s walk it byte by byte. We’ll skip the Ethernet header (the kernel strips it before tcpdump shows you the packet at the IP layer).

IPv4 header — bytes 0x00 through 0x13

0000:  45 00 00 3c 1c 46 40 00 40 06 b1 e6 c0 a8 00 68    E..<.F@.@......h
0010:  ac d9 0e 8e ...                                    ....

Bytes	Field	Value	Decoded
0	Version + IHL	`0x45`	Version 4, header length 5×4 = 20 bytes
1	DSCP / ECN	`0x00`	No QoS markings
2-3	Total length	`0x003c`	60 bytes (IP header + payload)
4-5	Identification	`0x1c46`	For fragmentation reassembly
6-7	Flags + frag offset	`0x4000`	"Don't Fragment" set, offset 0
8	TTL	`0x40`	64 hops remaining
9	Protocol	`0x06`	6 = TCP
10-11	Header checksum	`0xb1e6`	Verified by routers; recomputed on TTL change
12-15	Source IP	`c0 a8 00 68`	192.168.0.104
16-19	Destination IP	`ac d9 0e 8e`	172.217.14.142

Notice how the multi-byte fields are big-endian: total length 0x003c is "00, 3c" on the wire, which reads as the number 60 directly. Source IP c0 a8 00 68 is 192.168.0.104 — each byte one octet of the dotted-decimal form, in order.

TCP header — bytes 0x14 through 0x33

0010:                  0d be 01 bb 5b 8b 9d 4c 00 00 00 00            [..L....
0020:  a0 02 fa f0 91 7c 00 00 02 04 05 b4 04 02 08 0a    .....|..........
0030:  00 67 fb 21 00 00 00 00 01 03 03 07                .g.!........

Bytes	Field	Value	Decoded
20-21	Source port	`0x0dbe`	3518
22-23	Destination port	`0x01bb`	443 (HTTPS)
24-27	Sequence number	`0x5b8b9d4c`	1535229260
28-31	ACK number	`0x00000000`	0 (this is a SYN — no ACK)
32	Data offset + reserved	`0xa0`	Data offset 10 (× 4 = 40 bytes of TCP header)
33	Flags	`0x02`	SYN bit set
34-35	Window size	`0xfaf0`	64240 bytes the sender can receive
36-37	Checksum	`0x917c`	Of header + payload
38-39	Urgent pointer	`0x0000`	0; no urgent data
40-end	TCP options	...	MSS, SACK-permitted, timestamps, NOP, window scale

That’s the entire packet — 60 bytes that say "I’m 192.168.0.104, I want to start a TCP connection to 172.217.14.142 port 443, my starting sequence number is 1535229260, here are some options I support". Wireshark turns these bytes into the friendly tree view, but the bytes themselves are the truth.

Bit-level fields and bitfields

Some header fields are smaller than a byte. The first byte of the IPv4 header carries two: the 4-bit version (top half) and the 4-bit header length (bottom half), packed into one byte as (version << 4) | header_length. So 0x45 = 0100 0101 = version 4, header length 5.

The flags byte in TCP packs nine flag bits across two bytes. CWR, ECE, URG, ACK, PSH, RST, SYN, FIN — each is one bit. 0x02 = 0000 0010 = SYN only. 0x12 = 0001 0010 = SYN and ACK (the second packet of a handshake).

# decoding TCP flags in Python
flags = 0x12
print("FIN:" , bool(flags & 0x01))   # bit 0
print("SYN:" , bool(flags & 0x02))   # bit 1
print("RST:" , bool(flags & 0x04))   # bit 2
print("PSH:" , bool(flags & 0x08))   # bit 3
print("ACK:" , bool(flags & 0x10))   # bit 4
print("URG:" , bool(flags & 0x20))   # bit 5
print("ECE:" , bool(flags & 0x40))   # bit 6
print("CWR:" , bool(flags & 0x80))   # bit 7
# → SYN: True, ACK: True, others: False

Building a packet by hand

Once you can read a hex dump, building one is the same operation in reverse. Python’s struct module turns values into bytes in the right order:

import struct, socket

# Build a minimal TCP-style header by hand.
src_port = 12345
dst_port = 80
seq      = 1000
ack      = 0
offset   = 5            # 5 × 4 = 20 bytes
flags    = 0x02         # SYN
window   = 65535
checksum = 0
urg      = 0

# Pack format:
#   ! = network byte order (big-endian)
#   H = unsigned 16-bit, L = unsigned 32-bit, B = unsigned 8-bit
header = struct.pack(
    "!HHLLBBHHH",
    src_port, dst_port, seq, ack,
    (offset << 4), flags, window, checksum, urg
)

print(header.hex())
# 30390050 000003e8 00000000 50027fff 00000000

The ! at the start of the format string says "network byte order". Without it, struct uses the native order — big-endian on a big-endian machine, little-endian on a little-endian one — and your packet won’t be portable.

Go has encoding/binary with binary.BigEndian / binary.LittleEndian. Rust has the byteorder crate. C has htonl / htons as covered above. The exact API differs; the underlying operation is the same.

Text protocols versus binary protocols

Once you can put any value on the wire, a design choice opens up: send data as human-readable text, or as packed binary. HTTP/1, SMTP, Redis's older protocol, and most of the early Internet are text. You can read them in a terminal, type them by hand into nc, and debug them with your eyes. The cost is size and speed. The number 1000000 is seven bytes as the ASCII text "1000000" but four bytes as a binary integer, and the receiver has to parse the digits back into a number rather than just reading four bytes. Text is also full of ambiguity the parser must resolve: whitespace, case, where a field ends, how to escape special characters.

Binary protocols, the kind TCP and IP headers themselves use, pack fields into fixed positions with no separators and no parsing beyond reading the right number of bytes at the right offset. They are smaller and faster and harder to get subtly wrong, at the price of being unreadable without a tool and unforgiving of version drift, since a field that moves by one byte breaks every reader. The rough rule is that control-plane and developer-facing protocols lean text for legibility, while high-volume data-plane protocols lean binary for efficiency. Newer designs often split the difference: HTTP/2 reframes the same text-shaped HTTP semantics as a binary, length-prefixed wire format, getting the speed of binary without throwing away the model people already knew.

Character encodings, and why UTF-8 won

Text is its own serialization problem, because "the letter A" is not a byte until you fix an encoding. ASCII was the first widely agreed answer: 128 characters, each one byte, the top bit always zero. It covers the English alphabet, digits, punctuation, and control codes, and it is the reason the right-hand column of a hex dump is readable at all. But 128 characters cannot hold the world's scripts, and the decades of incompatible 8-bit and multi-byte encodings that tried to extend it (Latin-1, Shift-JIS, the many code pages) meant the same bytes rendered as different characters depending on which encoding the reader guessed. Text that crossed systems turned to mojibake.

Unicode fixed the first half of the problem by giving every character in every script a single number, a code point, with room for over a million of them. UTF-8 fixed the second half by defining how to turn those code points into bytes, and it did so with a design that explains why it took over completely. UTF-8 is variable-width: a code point becomes one to four bytes. The first 128 code points encode as a single byte identical to ASCII, so every ASCII file is already valid UTF-8 and every ASCII-speaking program keeps working unchanged. Code points beyond that use two, three, or four bytes, and the encoding is self-synchronizing: the leading byte of a multi-byte sequence announces how many bytes follow, and continuation bytes are tagged so they can never be mistaken for the start of a character. Drop into the middle of a UTF-8 stream and you can always find the next character boundary.

That backward compatibility is the whole story. UTF-8 needed no flag day, no coordinated switch; an ASCII world could adopt it one file and one program at a time, and the new characters simply used byte values ASCII never touched. It is also endianness-free, since it is defined as a sequence of bytes rather than a sequence of wider code units, which is why it sidesteps the byte-order marks that haunt UTF-16. The practical advice is short: send and store text as UTF-8 unless a protocol forces otherwise, declare it explicitly, and never assume one character equals one byte. A name with an accent, an emoji, or any non-Latin script will quietly be several bytes, and code that confuses byte length with character length will truncate strings mid-character and corrupt the output.

One byte is not one character. The string length your protocol cares about on the wire is a byte count. The character count your user cares about can be smaller. Length-prefix your strings in bytes, validate that the bytes are well-formed UTF-8 on the way in, and keep the two notions of "length" separate in your head.

MTU and fragmentation

The stream of bytes does not actually travel as one continuous flow; the lower layers chop it into packets, and there is a ceiling on how big each packet can be. That ceiling is the MTU, the maximum transmission unit, and on ordinary Ethernet it is 1500 bytes for the IP packet. Anything larger has to be carried in more than one packet. This is why the reference table lists a TCP MSS of 1460: take the 1500-byte MTU, subtract 20 bytes of IP header and 20 of TCP header, and what is left is the most application data one segment can carry.

When an IP packet is larger than the MTU of a link it must cross, something has to give. In IPv4 a router could once fragment the packet, splitting it into MTU-sized pieces that the destination reassembles using the identification and fragment-offset fields you will see in the header. Fragmentation works but is best avoided: losing any one fragment forces the whole original packet to be resent, reassembly costs the receiver memory and time, and fragments have historically been a rich source of security bugs. IPv6 removed in-flight fragmentation by routers entirely; the sender must size its packets correctly instead.

In practice the sender discovers the largest packet that fits the whole path without fragmenting, the path MTU, and stays under it. TCP does this almost invisibly, which is part of why TCP feels like a clean byte stream even though the bytes are being diced into segments and packets the entire way. You write a megabyte; TCP and IP turn it into hundreds of MTU-sized packets, number them, and the far side stitches the bytes back into the same megabyte, in order, with the packet boundaries erased. The framing problem from earlier is the same idea one layer up: TCP hides the packet boundaries, so your application must invent its own message boundaries on top.

Why all of this underlies every higher protocol

Everything above the byte stream is built from the pieces on this page. HTTP is text framed by a delimiter for its headers and a length prefix, the Content-Length, for its body, sent over a TCP byte stream. DNS is a compact binary format with big-endian fields and length-prefixed labels. TLS wraps everything in length-prefixed records. Protocol Buffers, Thrift, and the binary halves of HTTP/2 and gRPC are length-prefixed binary all the way down. Each one is a different answer to the same two questions: how do we encode our values into bytes, and how do we frame those bytes into messages.

That is why this layer is worth the time. When a higher protocol misbehaves, a field comes back as a wrong number, a message arrives split or doubled, a string shows up garbled, the cause is nearly always one of the things here: a byte-order mismatch, a framing assumption that one read is one message, or an encoding that two sides disagree on. The same mental model serves you up and down the stack, from the sockets that hand you the raw bytes to the TCP stream that orders them to the serialization formats that give them meaning. Bytes on the wire is not the bottom of the stack you pass through on the way to the interesting parts. It is the grammar the interesting parts are written in.

A quick reference

What	Bytes
MAC address	6 bytes (e.g. `00:1a:2b:3c:4d:5e`)
IPv4 address	4 bytes
IPv6 address	16 bytes
Port number	2 bytes (16-bit unsigned, 0–65535)
Ethernet frame, minimum	64 bytes (header + 46 payload + 4 FCS)
Ethernet frame, maximum (standard)	1518 bytes (or 1522 with VLAN)
IPv4 header, no options	20 bytes
IPv6 header (fixed)	40 bytes
TCP header, no options	20 bytes
UDP header	8 bytes
Default MTU on Ethernet	1500 bytes
TCP MSS on Ethernet (typical)	1460 bytes (1500 − 20 IP − 20 TCP)

Tools — looking at bytes directly

Tool	Use for
`xxd file.bin`	Pretty hex dump of any file. `xxd -r` reverses; useful for tests.
`hexdump -C file.bin`	The other classic. Slightly different layout; same information.
`od -A x -t x1z -v file.bin`	The POSIX-portable version. Ugly but available everywhere.
`tcpdump -nn -X`	Live packet capture with hex + ASCII view. The first thing to run when "what’s actually on the wire".
Wireshark	Click any field in the dissected tree; the matching bytes highlight in the byte view at the bottom. The single best learning tool.
`scapy` (Python)	Build, send, and dissect packets programmatically. The right tool for "what if I sent a malformed X".

Common mistakes

Forgetting htons on the port. The classic. Your server "binds successfully" but listens on a port you didn’t expect, and nc localhost 8080 gets connection-refused.
Calling htonl on something already in network order. inet_addr already returns network order. Passing it through htonl on a little-endian machine corrupts it.
Reading bytes past the buffer. If the spec says the next field is 4 bytes and you only have 3, don’t pad with zeros — the packet is malformed. Decoders that paper over short packets are how memory-safety bugs ship.
Treating a hex dump as ASCII. A hex dump that looks like text means the bytes happen to fall in the printable range. The "..." column on the right is a hint, not the data.
Confusing byte order with bit order. Network byte order is about bytes within a multi-byte field. The bit order within each byte is the same on every machine you’ll meet (most-significant bit first when described). The IP flags byte is bit 7 = MSB, regardless of host endianness.

Bytes on the wire

The wire is a stream of bytes

Endianness — what byte order means

Network byte order, host byte order

Serializing numbers, strings, and structs

The framing problem

Reading a hex dump

Walking a real packet

IPv4 header — bytes 0x00 through 0x13

TCP header — bytes 0x14 through 0x33

Bit-level fields and bitfields

Building a packet by hand

Text protocols versus binary protocols

Character encodings, and why UTF-8 won

MTU and fragmentation

Why all of this underlies every higher protocol

A quick reference

Tools — looking at bytes directly

Common mistakes

Further reading