Bytes on the wire
A network carries one thing: an ordered stream of bytes. No types, no strings, no structs, no message boundaries, just bytes, in the order they were sent. Everything you think of as data is something both sides agreed to read into those bytes. This page is about that agreement: how a number turns into bytes and back, why two machines can read the same bytes and disagree, how you find where one message ends and the next begins, and why text is its own hard problem. Get this layer right and every protocol above it stops feeling abstract; get it wrong and nothing higher up can save you.
The wire is a stream of bytes
Start from the only fact the network guarantees: bytes arrive in the order they were sent. That is the whole contract. A byte is eight bits, a number from 0 to 255, and the medium moves a run of those numbers from one machine to another. It has no idea that some of them spell a port, some hold a timestamp, and some are the letters of your name. The meaning lives entirely in an agreement between the two programs, written down as a protocol. The sender encodes its data into bytes by the rules of that protocol; the receiver decodes the same bytes by the same rules. When the rules match, you get your data back. When they differ by even one byte, you get garbage, and the network reports no error, because as far as it is concerned it delivered exactly what it was given.
This is why two questions sit under every protocol. The first is encoding: given a value in memory — an integer, a string, a record with several fields — what exact sequence of bytes represents it? The second is structure: when the bytes arrive, how does the receiver tell which bytes belong to which field, and where one message stops and the next starts? The rest of this page is those two questions, worked out. Encoding brings in endianness and character sets; structure brings in framing. Both are invisible when they work and maddening when they do not.
Endianness — what byte order means
Computers store multi-byte numbers in two possible orders.
Take the 32-bit number 0x12345678:
| Order | Address 0 | Address 1 | Address 2 | Address 3 |
|---|---|---|---|---|
| Big-endian | 12 | 34 | 56 | 78 |
| Little-endian | 78 | 56 | 34 | 12 |
Big-endian stores the most-significant byte first — the way you’d write the number out by hand. Little-endian stores the least-significant byte first, which is more convenient for the CPU when doing arithmetic on numbers of varying width. Both conventions exist for historical reasons; both work; neither is morally superior.
What matters is that they’re different. If a machine that uses one order writes a 32-bit number to a network packet, and a machine that uses the other order reads it back without conversion, the number is wrong by a permutation of its bytes.
| Architecture | Order |
|---|---|
| x86 / x86-64 | Little-endian |
| Apple Silicon, modern ARM | Little-endian (configurable, in practice always LE) |
| Older PowerPC, SPARC, MIPS | Big-endian |
| Network protocols (IP, TCP, UDP) | Big-endian — "network byte order" |
The Internet protocols are big-endian by historical accident, from the 1970s, when many of the influential systems (PDP-10, IBM mainframes) were big-endian. It stuck. So today, almost every machine on the network is little-endian internally, and almost every byte they put on the network is big-endian — they convert at the boundary.
Network byte order, host byte order
C and most systems languages give you four functions for the conversion. They’re named after what they do — host to network, network to host, short (16-bit) or long (32-bit):
#include <arpa/inet.h>
uint16_t htons(uint16_t hostshort); // host → network, 16-bit
uint32_t htonl(uint32_t hostlong); // host → network, 32-bit
uint16_t ntohs(uint16_t netshort); // network → host, 16-bit
uint32_t ntohl(uint32_t netlong); // network → host, 32-bitOn a big-endian machine, all four are no-ops — host order already is network
order. On a little-endian machine, they swap bytes. Either way, the calling code is
portable: write htons(8080) and the right thing happens.
The most common place beginners hit endianness is the port number in a sockaddr.
sin.sin_port = 8080 looks correct, but on a little-endian machine you’ve
just told the kernel to use port 0x9000 = 36864 because the bytes were
stored in the wrong order. The fix is sin.sin_port = htons(8080).
inet_addr("192.168.1.1")
and inet_pton(AF_INET, "192.168.1.1", &addr) already return network byte
order. You don’t pass them through htonl — they’re already done.Serializing numbers, strings, and structs
Turning a value in memory into bytes for the wire is called serialization. Numbers are the
easy case once endianness is settled. A 32-bit integer is always four bytes; you pick an
order, both sides agree, done. The questions you have to answer up front are width and
signedness: is this field 1, 2, 4, or 8 bytes, and can it go negative? A protocol spec
pins both. Signed integers on the wire almost always use two's complement, the same
representation your CPU uses, so minus one in a 32-bit field is ff ff ff ff.
Floating-point numbers are sent as their IEEE 754 bit pattern, which is portable across
essentially every machine you will meet, though it still carries a byte order.
Strings are where it gets interesting, because a string has no fixed width. The bytes
that spell "hello" are five long; the bytes that spell a username could be any length.
The receiver reading a stream of bytes has no way to know where the string ends unless the
protocol tells it. There are two ways to say so, and they are the same two ways you frame
whole messages, which is not a coincidence. You can put the length in front of the
string, a length prefix, say a 2-byte count followed by exactly that many bytes, or you
can mark the end with a sentinel byte the string itself can never contain, the classic
being the C convention of a trailing 0x00. Length-prefixed strings are
safer and faster to read because you know up front how many bytes to grab and you can
allocate exactly the right buffer. Null-terminated strings are compact but force a scan to
find the end and break the moment the data legitimately contains a zero byte.
A struct, a record with several fields, is just those rules applied one field after another, in the exact order the spec lists. The IPv4 header you will walk later is a struct: a 1-byte field, then another 1-byte field, then a 2-byte field, and so on, packed with no gaps. That last word matters. In memory, your compiler is free to insert padding between struct fields so each one lands on a convenient address, and the amount of padding depends on the platform. On the wire there is no padding; fields butt up against each other exactly as the spec draws them. This is the trap behind casting a wire buffer straight onto a C struct pointer: the in-memory layout, with its padding and native byte order, is almost never the wire layout. You serialize field by field, or you use a packed struct and convert byte order explicitly, but you do not assume the two layouts match.
The framing problem
Here is the single most common source of real bugs at this layer, and the one most
tutorials skip. TCP is a byte stream,
not a message stream. When you call send() twice, once with 100 bytes and once
with 50, TCP does not promise the other side gets a 100-byte read followed by a 50-byte
read. It promises only that the 150 bytes arrive in order. The receiver might get all 150
in one recv(), or 30 then 120, or one byte at a time. TCP coalesces and splits
your writes however the network and its buffers see fit. There are no message boundaries on
the wire because TCP never put any there.
So if your application has messages, and almost every protocol does, you have to build the boundaries yourself, on top of the stream. That is framing: a rule the receiver uses to carve the incoming byte stream back into the messages the sender meant. There are two classic rules, and every protocol you know picks one or the other.
Length-prefixing puts a fixed-size count at the front of each message. The
receiver reads the count first, say 4 bytes giving the body length, then reads exactly
that many more bytes, and now it holds one complete message. This is what most binary
protocols do, because it is unambiguous and the body can contain any byte at all, including
the bytes that would otherwise look like a delimiter. Delimiters instead
mark the end of each message with a sentinel sequence the body is guaranteed not to
contain, a newline for line-based text protocols, \r\n\r\n for the end of
HTTP/1 headers. The receiver reads until it sees the delimiter, and everything before it is
one message.
Both rules share a consequence that trips up everyone the first time: you cannot assume one
recv() equals one message. Because TCP chops the stream wherever it likes, the
length prefix can arrive in one read and the body in the next, or two whole messages can
arrive in a single read, or a message can be split straight down the middle. So a correct
reader keeps a buffer. It appends whatever each recv() returns to that buffer,
then repeatedly checks whether the buffer now holds a complete frame, enough bytes for the
declared length, or a delimiter somewhere inside it. When it does, it removes that frame and
processes it, leaving any leftover bytes in the buffer as the start of the next message.
This buffer-and-check loop is the heart of every protocol reader ever written, and skipping
it is the bug behind half of all "works on localhost, fails over the real network" reports,
because a fast local loopback often delivers a whole small message in one read and hides the
problem.
Each rule has its own pitfalls. A length prefix that is too small to hold the real message size truncates large messages; one that is attacker-controlled lets a peer claim a four-gigabyte body and exhaust your memory, so real servers cap the declared length before allocating. Delimiters need escaping or a guarantee the body cannot contain the marker; a line protocol that forgets to handle a newline inside a value will split one message into two. And a partial read of a delimiter-framed stream forces you to scan the buffer again from where you left off each time more data lands, which is why high-throughput systems lean toward length prefixes. The deeper trade-offs of schemas, versioning, and self-describing formats show up once you reach a real serialization format like Protocol Buffers, which uses length-prefixed fields throughout precisely to dodge these problems.
Reading a hex dump
Every byte-level tool you’ll use — Wireshark, tcpdump, hexdump, xxd — eventually shows you a hex dump. The format is consistent enough to learn once.
0000: 45 00 00 3c 1c 46 40 00 40 06 b1 e6 c0 a8 00 68 E..<.F@.@......h
0010: ac d9 0e 8e 0d be 01 bb 5b 8b 9d 4c 00 00 00 00 ........[..L....
0020: a0 02 fa f0 91 7c 00 00 02 04 05 b4 04 02 08 0a .....|..........
0030: 00 67 fb 21 00 00 00 00 01 03 03 07 .g.!........Three columns. The leftmost is the offset into the dump (in hex). The middle is 16 bytes per line, in hex, two characters per byte, separated by spaces. The right is the same bytes shown as ASCII where the byte falls in the printable range (0x20–0x7e), with a dot for everything else.
You read it left to right, top to bottom — exactly like text. Offset
0x10 is the byte 16 positions in (counting from zero); offset
0x20 is byte 32; and so on. To find the byte at offset 22, jump to the
0x10 line and count six bytes in: 01.
Walking a real packet
The dump above is a real TCP SYN packet. Let’s walk it byte by byte. We’ll skip the Ethernet header (the kernel strips it before tcpdump shows you the packet at the IP layer).
IPv4 header — bytes 0x00 through 0x13
0000: 45 00 00 3c 1c 46 40 00 40 06 b1 e6 c0 a8 00 68 E..<.F@.@......h
0010: ac d9 0e 8e ... ....| Bytes | Field | Value | Decoded |
|---|---|---|---|
| 0 | Version + IHL | 0x45 | Version 4, header length 5×4 = 20 bytes |
| 1 | DSCP / ECN | 0x00 | No QoS markings |
| 2-3 | Total length | 0x003c | 60 bytes (IP header + payload) |
| 4-5 | Identification | 0x1c46 | For fragmentation reassembly |
| 6-7 | Flags + frag offset | 0x4000 | "Don't Fragment" set, offset 0 |
| 8 | TTL | 0x40 | 64 hops remaining |
| 9 | Protocol | 0x06 | 6 = TCP |
| 10-11 | Header checksum | 0xb1e6 | Verified by routers; recomputed on TTL change |
| 12-15 | Source IP | c0 a8 00 68 | 192.168.0.104 |
| 16-19 | Destination IP | ac d9 0e 8e | 172.217.14.142 |
Notice how the multi-byte fields are big-endian: total length 0x003c is
"00, 3c" on the wire, which reads as the number 60 directly. Source IP
c0 a8 00 68 is 192.168.0.104 — each byte one octet of the dotted-decimal
form, in order.
TCP header — bytes 0x14 through 0x33
0010: 0d be 01 bb 5b 8b 9d 4c 00 00 00 00 [..L....
0020: a0 02 fa f0 91 7c 00 00 02 04 05 b4 04 02 08 0a .....|..........
0030: 00 67 fb 21 00 00 00 00 01 03 03 07 .g.!........| Bytes | Field | Value | Decoded |
|---|---|---|---|
| 20-21 | Source port | 0x0dbe | 3518 |
| 22-23 | Destination port | 0x01bb | 443 (HTTPS) |
| 24-27 | Sequence number | 0x5b8b9d4c | 1535229260 |
| 28-31 | ACK number | 0x00000000 | 0 (this is a SYN — no ACK) |
| 32 | Data offset + reserved | 0xa0 | Data offset 10 (× 4 = 40 bytes of TCP header) |
| 33 | Flags | 0x02 | SYN bit set |
| 34-35 | Window size | 0xfaf0 | 64240 bytes the sender can receive |
| 36-37 | Checksum | 0x917c | Of header + payload |
| 38-39 | Urgent pointer | 0x0000 | 0; no urgent data |
| 40-end | TCP options | ... | MSS, SACK-permitted, timestamps, NOP, window scale |
That’s the entire packet — 60 bytes that say "I’m 192.168.0.104, I want to start a TCP connection to 172.217.14.142 port 443, my starting sequence number is 1535229260, here are some options I support". Wireshark turns these bytes into the friendly tree view, but the bytes themselves are the truth.
Bit-level fields and bitfields
Some header fields are smaller than a byte. The first byte of the IPv4 header carries
two: the 4-bit version (top half) and the 4-bit header length (bottom half), packed
into one byte as (version << 4) | header_length. So
0x45 = 0100 0101 = version 4, header length 5.
The flags byte in TCP packs nine flag bits across two bytes. CWR, ECE, URG, ACK, PSH,
RST, SYN, FIN — each is one bit. 0x02 = 0000 0010 = SYN
only. 0x12 = 0001 0010 = SYN and ACK (the second packet of
a handshake).
# decoding TCP flags in Python
flags = 0x12
print("FIN:" , bool(flags & 0x01)) # bit 0
print("SYN:" , bool(flags & 0x02)) # bit 1
print("RST:" , bool(flags & 0x04)) # bit 2
print("PSH:" , bool(flags & 0x08)) # bit 3
print("ACK:" , bool(flags & 0x10)) # bit 4
print("URG:" , bool(flags & 0x20)) # bit 5
print("ECE:" , bool(flags & 0x40)) # bit 6
print("CWR:" , bool(flags & 0x80)) # bit 7
# → SYN: True, ACK: True, others: FalseBuilding a packet by hand
Once you can read a hex dump, building one is the same operation in reverse. Python’s
struct module turns values into bytes in the right order:
import struct, socket
# Build a minimal TCP-style header by hand.
src_port = 12345
dst_port = 80
seq = 1000
ack = 0
offset = 5 # 5 × 4 = 20 bytes
flags = 0x02 # SYN
window = 65535
checksum = 0
urg = 0
# Pack format:
# ! = network byte order (big-endian)
# H = unsigned 16-bit, L = unsigned 32-bit, B = unsigned 8-bit
header = struct.pack(
"!HHLLBBHHH",
src_port, dst_port, seq, ack,
(offset << 4), flags, window, checksum, urg
)
print(header.hex())
# 30390050 000003e8 00000000 50027fff 00000000The ! at the start of the format string says "network byte order".
Without it, struct uses the native order — big-endian on a big-endian
machine, little-endian on a little-endian one — and your packet won’t be portable.
Go has encoding/binary with binary.BigEndian /
binary.LittleEndian. Rust has the
byteorder
crate. C has htonl / htons as covered above. The exact API
differs; the underlying operation is the same.
Text protocols versus binary protocols
Once you can put any value on the wire, a design choice opens up: send data as
human-readable text, or as packed binary. HTTP/1, SMTP, Redis's older protocol, and most
of the early Internet are text. You can read them in a terminal, type them by hand into
nc, and debug them with your eyes. The cost is size and speed. The number
1000000 is seven bytes as the ASCII text "1000000" but four bytes as a binary integer, and
the receiver has to parse the digits back into a number rather than just reading four bytes.
Text is also full of ambiguity the parser must resolve: whitespace, case, where a field
ends, how to escape special characters.
Binary protocols, the kind TCP and IP headers themselves use, pack fields into fixed positions with no separators and no parsing beyond reading the right number of bytes at the right offset. They are smaller and faster and harder to get subtly wrong, at the price of being unreadable without a tool and unforgiving of version drift, since a field that moves by one byte breaks every reader. The rough rule is that control-plane and developer-facing protocols lean text for legibility, while high-volume data-plane protocols lean binary for efficiency. Newer designs often split the difference: HTTP/2 reframes the same text-shaped HTTP semantics as a binary, length-prefixed wire format, getting the speed of binary without throwing away the model people already knew.
Character encodings, and why UTF-8 won
Text is its own serialization problem, because "the letter A" is not a byte until you fix an encoding. ASCII was the first widely agreed answer: 128 characters, each one byte, the top bit always zero. It covers the English alphabet, digits, punctuation, and control codes, and it is the reason the right-hand column of a hex dump is readable at all. But 128 characters cannot hold the world's scripts, and the decades of incompatible 8-bit and multi-byte encodings that tried to extend it (Latin-1, Shift-JIS, the many code pages) meant the same bytes rendered as different characters depending on which encoding the reader guessed. Text that crossed systems turned to mojibake.
Unicode fixed the first half of the problem by giving every character in every script a single number, a code point, with room for over a million of them. UTF-8 fixed the second half by defining how to turn those code points into bytes, and it did so with a design that explains why it took over completely. UTF-8 is variable-width: a code point becomes one to four bytes. The first 128 code points encode as a single byte identical to ASCII, so every ASCII file is already valid UTF-8 and every ASCII-speaking program keeps working unchanged. Code points beyond that use two, three, or four bytes, and the encoding is self-synchronizing: the leading byte of a multi-byte sequence announces how many bytes follow, and continuation bytes are tagged so they can never be mistaken for the start of a character. Drop into the middle of a UTF-8 stream and you can always find the next character boundary.
That backward compatibility is the whole story. UTF-8 needed no flag day, no coordinated switch; an ASCII world could adopt it one file and one program at a time, and the new characters simply used byte values ASCII never touched. It is also endianness-free, since it is defined as a sequence of bytes rather than a sequence of wider code units, which is why it sidesteps the byte-order marks that haunt UTF-16. The practical advice is short: send and store text as UTF-8 unless a protocol forces otherwise, declare it explicitly, and never assume one character equals one byte. A name with an accent, an emoji, or any non-Latin script will quietly be several bytes, and code that confuses byte length with character length will truncate strings mid-character and corrupt the output.
MTU and fragmentation
The stream of bytes does not actually travel as one continuous flow; the lower layers chop it into packets, and there is a ceiling on how big each packet can be. That ceiling is the MTU, the maximum transmission unit, and on ordinary Ethernet it is 1500 bytes for the IP packet. Anything larger has to be carried in more than one packet. This is why the reference table lists a TCP MSS of 1460: take the 1500-byte MTU, subtract 20 bytes of IP header and 20 of TCP header, and what is left is the most application data one segment can carry.
When an IP packet is larger than the MTU of a link it must cross, something has to give. In IPv4 a router could once fragment the packet, splitting it into MTU-sized pieces that the destination reassembles using the identification and fragment-offset fields you will see in the header. Fragmentation works but is best avoided: losing any one fragment forces the whole original packet to be resent, reassembly costs the receiver memory and time, and fragments have historically been a rich source of security bugs. IPv6 removed in-flight fragmentation by routers entirely; the sender must size its packets correctly instead.
In practice the sender discovers the largest packet that fits the whole path without fragmenting, the path MTU, and stays under it. TCP does this almost invisibly, which is part of why TCP feels like a clean byte stream even though the bytes are being diced into segments and packets the entire way. You write a megabyte; TCP and IP turn it into hundreds of MTU-sized packets, number them, and the far side stitches the bytes back into the same megabyte, in order, with the packet boundaries erased. The framing problem from earlier is the same idea one layer up: TCP hides the packet boundaries, so your application must invent its own message boundaries on top.
Why all of this underlies every higher protocol
Everything above the byte stream is built from the pieces on this page. HTTP is text framed
by a delimiter for its headers and a length prefix, the Content-Length, for its
body, sent over a TCP byte stream. DNS is a compact binary format with big-endian fields and
length-prefixed labels. TLS wraps everything in length-prefixed records. Protocol Buffers,
Thrift, and the binary halves of HTTP/2 and gRPC are length-prefixed binary all the way down.
Each one is a different answer to the same two questions: how do we encode our values into
bytes, and how do we frame those bytes into messages.
That is why this layer is worth the time. When a higher protocol misbehaves, a field comes back as a wrong number, a message arrives split or doubled, a string shows up garbled, the cause is nearly always one of the things here: a byte-order mismatch, a framing assumption that one read is one message, or an encoding that two sides disagree on. The same mental model serves you up and down the stack, from the sockets that hand you the raw bytes to the TCP stream that orders them to the serialization formats that give them meaning. Bytes on the wire is not the bottom of the stack you pass through on the way to the interesting parts. It is the grammar the interesting parts are written in.
A quick reference
| What | Bytes |
|---|---|
| MAC address | 6 bytes (e.g. 00:1a:2b:3c:4d:5e) |
| IPv4 address | 4 bytes |
| IPv6 address | 16 bytes |
| Port number | 2 bytes (16-bit unsigned, 0–65535) |
| Ethernet frame, minimum | 64 bytes (header + 46 payload + 4 FCS) |
| Ethernet frame, maximum (standard) | 1518 bytes (or 1522 with VLAN) |
| IPv4 header, no options | 20 bytes |
| IPv6 header (fixed) | 40 bytes |
| TCP header, no options | 20 bytes |
| UDP header | 8 bytes |
| Default MTU on Ethernet | 1500 bytes |
| TCP MSS on Ethernet (typical) | 1460 bytes (1500 − 20 IP − 20 TCP) |
Tools — looking at bytes directly
| Tool | Use for |
|---|---|
xxd file.bin | Pretty hex dump of any file. xxd -r reverses; useful for tests. |
hexdump -C file.bin | The other classic. Slightly different layout; same information. |
od -A x -t x1z -v file.bin | The POSIX-portable version. Ugly but available everywhere. |
tcpdump -nn -X | Live packet capture with hex + ASCII view. The first thing to run when "what’s actually on the wire". |
| Wireshark | Click any field in the dissected tree; the matching bytes highlight in the byte view at the bottom. The single best learning tool. |
scapy (Python) | Build, send, and dissect packets programmatically. The right tool for "what if I sent a malformed X". |
Common mistakes
- Forgetting
htonson the port. The classic. Your server "binds successfully" but listens on a port you didn’t expect, andnc localhost 8080gets connection-refused. - Calling
htonlon something already in network order.inet_addralready returns network order. Passing it throughhtonlon a little-endian machine corrupts it. - Reading bytes past the buffer. If the spec says the next field is 4 bytes and you only have 3, don’t pad with zeros — the packet is malformed. Decoders that paper over short packets are how memory-safety bugs ship.
- Treating a hex dump as ASCII. A hex dump that looks like text means the bytes happen to fall in the printable range. The "..." column on the right is a hint, not the data.
- Confusing byte order with bit order. Network byte order is about bytes within a multi-byte field. The bit order within each byte is the same on every machine you’ll meet (most-significant bit first when described). The IP flags byte is bit 7 = MSB, regardless of host endianness.
Further reading
- RFC 1700 — Assigned Numbers (historical) and the IANA registries it points to. Worth knowing as the source of truth for "what does protocol number 6 mean".
- RFC 791 — IPv4 — the original 1981 spec. The header diagram in §3.1 is the one every introductory course shows.
- Beej’s Guide — chapter 3 covers byte order with a specific section on host-vs-network order. Beginner-friendly.
- Julia Evans — Network byte order — a one-page illustrated zine that summarises everything on this page.
- High Performance Browser Networking — Grigorik’s free book covers packet-level details for higher protocols (HTTP/2 framing, TLS records, QUIC packets) and is worth reading once you’re comfortable with TCP/IP headers.
- The TCP/IP Guide — Charles Kozierok — a free online resource that goes byte-by-byte through every header in the common protocol stack. Encyclopedic reference rather than a beginner read.