06 / 11

Protocols / 06

JSON as a wire format

JSON is a small text format with a data model anyone can hold in their head: objects, arrays, strings, numbers, booleans, and null. It won the API serialisation argument of the 2010s by being readable, schemaless, and trivial to debug with tools you already have. This page walks the whole format as engineers actually meet it: the grammar and types, why it spread, the places it bites, how parsers turn bytes into trees, how you stream it, how you validate it, where it gets attacked, and when the honest answer is to reach for a binary format instead.

What JSON actually is

JSON is a way to write a tree of data as text. That is the whole idea. A value is either a scalar — a string, a number, a boolean, or null — or a container that holds more values: an object (a set of name/value pairs) or an array (an ordered list). Containers nest inside containers, and the result is a tree with one value at the root. The format ships that tree as a run of printable characters that a person can read and an editor can diff, which turns out to be most of the reason it spread.

The format is specified twice and the two specs agree. RFC 8259 is the IETF version, and ECMA-404 is the standards-body version that defines only the grammar. Both fit comfortably on a page. That brevity is the point: there is so little to JSON that any language with a string type can write a parser in an afternoon, and most languages ship one in the standard library. When a format is small enough that everyone implements it, it becomes a default, and defaults win.

It helps to separate two things that often get muddled. There is the JSON data model — the abstract set of value types — and there is the JSON text encoding — the rules for writing one of those values as characters. Most of JSON's strengths come from the model being tiny. Most of its costs come from the encoding being text. Keep the two apart as we go and the trade-offs stop looking contradictory.

The data model, in full

There are six value types and that is the complete list. A document is one value, usually an object or an array, with scalars and more containers nested inside it. The grammar below is the entire format: a value is one of the six, an object wraps name/value pairs in braces, an array wraps values in brackets, and whitespace is allowed between tokens.

The complete JSON grammar. Six value types; two of them are containers that hold more values, which is what gives you a tree.

Type	Notes
`object`	Unordered set of string-keyed name/value pairs. Duplicate keys are not forbidden by the grammar but are not portable; most parsers keep the last one, some keep the first, a few error.
`array`	Ordered list of values. May be heterogeneous — nothing stops `[1, "two", true, null]`.
`string`	A sequence of Unicode characters in double quotes. Certain characters must be escaped, and the document itself is almost always UTF-8.
`number`	A decimal, optionally with a fraction and exponent. The spec puts no bound on precision, so what a number means depends entirely on the parser that reads it.
`true` / `false`	The two boolean literals, written lower-case and unquoted.
`null`	The single null literal.

Read that list again for what is missing. There is no integer type separate from floating point, no date or timestamp, no binary blob, no comments, no trailing commas, and no way to reference one part of a document from another. Every one of those gaps becomes a convention each API has to invent: dates as ISO-8601 strings, binary as base64 strings, big integers as quoted strings, and comments left out entirely or smuggled into a field named _comment. The format is small because it pushes those decisions onto you.

Why it won

JSON did not win on a feature checklist. It lost most checklists. It won because of three properties that matter more in practice than they look on paper.

First, it is readable. You can open a JSON response in any text editor, in a browser tab, in a log line, and understand it without a decoder ring. When something breaks at three in the morning, you can curl the endpoint, pipe it through jq, and see exactly what came back. A format you can read with your eyes removes a whole class of debugging tools from the critical path, and that is worth more than wire efficiency to most teams most of the time.

Second, it is everywhere. JSON grew up alongside JavaScript and the browser, where JSON.parse and JSON.stringify are built in and fast. From there it spread to every server language, every HTTP client, every logging stack. When a format is the default in the one runtime that ships on every device on earth, it becomes the lowest-common- denominator interchange format, and the network effect compounds: people pick JSON because everyone else picked JSON.

Third, it is schemaless. You can ship a JSON document without agreeing a contract first, add a field tomorrow without breaking yesterday's clients, and let each consumer read only the parts it cares about. For early-stage products and loosely coupled systems, that flexibility is the feature. The cost — that the structure lives only in your head and your docs, never in the wire format — comes due later, which is why JSON Schema exists. We will get there.

The costs that come with text

The same properties that make JSON pleasant make it expensive in specific ways. None of these are reasons to avoid it; they are reasons to know what you are paying for.

It is verbose. A list of a thousand records repeats every field name a thousand times. The keys "customer_id" and "created_at" appear on every row, even though they carry no per-row information. Compression hides most of this on the wire — repeated keys compress beautifully — but the parser still has to read and discard every copy, so the CPU cost survives even when the byte cost does not.
There is no native binary. The only way to carry raw bytes is to base64- encode them into a string, which inflates the payload by about a third and forces a decode step on both ends. For images, files, or anything binary, JSON is the wrong envelope and you usually want a separate transfer entirely.
Number precision is a trap. The spec allows arbitrary-precision decimals, but the dominant parser — JavaScript's — reads every number into a 64-bit float. Floats hold integers exactly only up to 2⁵³, about 9 quadrillion. A 64-bit database ID like 9007199254740993 silently round-trips through a browser as 9007199254740992, off by one, with no error. The same hole eats fixed-point currency and high-resolution timestamps. The standard fix is to send large integers as quoted strings and parse them deliberately.
No comments, no dates, no trailing commas. You cannot annotate a config file inline, you cannot write a timestamp as a first-class value, and you cannot leave a dangling comma after the last array element without a syntax error. These are small papercuts individually, and together they are why every JSON-for-config format — JSON5, JSONC, HOCON — exists to add the conveniences back.

The int64 rule. If a number can exceed 2⁵³ and a JavaScript client will ever touch it, send it as a string. IDs, balances in the smallest unit, and nanosecond timestamps all qualify. The bug is invisible until a value crosses the threshold, and then it is a data-integrity incident, not a display glitch.

How a parser reads it

Turning JSON text into something a program can use is two steps. A lexer scans the byte stream and emits tokens — a left brace, a string, a colon, a number, a comma. A parser consumes those tokens and builds structure according to the grammar, rejecting anything that does not fit. The output is a tree of native values: a map, a list, a string, a number.

A parse tree. The object holds two pairs; one value is a number leaf, the other is an array of two string leaves.

There are two ways to do the second step, and the choice matters once documents get large. The DOM style — what JSON.parse and almost every convenience API do — reads the whole document and builds the complete tree in memory before handing it back. It is the easy path and the right default, but it needs memory proportional to the document, often several times the byte size once you account for object overhead.

The streaming style, sometimes called SAX after the equivalent for XML, walks the tokens and fires events — "object started," "key seen," "value is 42," "array ended" — without ever building the full tree. Your code reacts to each event and keeps only what it needs. It is more work to write and you give up random access, but it lets you process a document far larger than memory, which is the difference between handling a 50 GB export and falling over on it.

Style	Memory	Use when
DOM (parse to a tree)	Proportional to document size	The document fits comfortably and you want random access. The default.
Streaming (SAX / events)	Proportional to nesting depth, not size	The document is larger than memory, or you only need a few fields from a big payload.
Pull / iterator	Bounded	You want streaming control without inverting your code into callbacks.

Streaming many records: NDJSON

A single JSON document has exactly one root value, which makes it a poor fit for an open-ended stream of records. You cannot keep appending objects to a file and decode them one at a time, because the moment you write a second top-level object the file stops being valid JSON. The whole-document model assumes the document ends.

The convention that fixes this is NDJSON, also called JSON Lines: put one complete JSON value on each line, separated by newlines, and treat the file as a sequence of independent documents rather than one big one. Each line parses on its own, so a reader can process records as they arrive, a writer can append forever, and a crash halfway through costs you at most one partial line instead of the whole file.

{"event":"login","user":"alice","ts":"2026-06-07T09:00:01Z"}
{"event":"view","user":"alice","page":"/pricing"}
{"event":"logout","user":"alice","ts":"2026-06-07T09:14:22Z"}

This is the format behind a lot of infrastructure you already use: log pipelines, the Elasticsearch bulk API, jq's default output, the Kubernetes --watch stream, and most large data exports. It keeps JSON's readability — every line is still a normal object you can eyeball — while side-stepping the single-root limitation. For moderate throughput it is the right tool. For millions of records per second, the per-record parse cost adds up and a binary format with proper framing pulls ahead.

Giving JSON a contract: JSON Schema

JSON describes data, not the shape data is supposed to have. JSON Schema fills that gap. It is a vocabulary, itself written in JSON, for stating what a valid document looks like: which fields are required, what types they hold, what ranges and patterns they must satisfy, and whether extra fields are allowed. The current drafts are 2019-09 and 2020-12, and OpenAPI 3.1 aligned its schema dialect with 2020-12, which is why the two now share a vocabulary.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/schemas/charge.json",
  "type": "object",
  "required": ["id", "amount_cents", "currency"],
  "properties": {
    "id":           { "type": "string", "format": "uuid" },
    "amount_cents": { "type": "integer", "minimum": 0 },
    "currency":     { "type": "string", "pattern": "^[A-Z]{3}$" },
    "status":       { "enum": ["pending", "settled", "failed"] }
  },
  "additionalProperties": false
}

A schema earns its keep in three places. It validates input at the edge of a service, so a malformed request is rejected with a clear message before it touches your business logic. It generates typed clients, so an SDK in another language knows the field names and types without a human transcribing them. And it documents the API, which is what most teams use it for in practice — the schema becomes the single source of truth that the docs site, the mock server, and the code generator all read.

Validation flow. The document and the schema meet at the validator, which either lets the document through or returns a precise list of what failed.

One subtlety worth flagging: "additionalProperties": false changes the contract from "must contain these fields" to "must contain only these fields." It is the difference between a forgiving API that ignores unknown keys and a strict one that rejects them. Strict is safer for write paths where an unexpected field might be a typo or an attack; forgiving is kinder for read paths where a client might send a field a newer version added. Pick on purpose.

Where JSON gets attacked

A parser is an interpreter for untrusted input, and JSON parsers have their own catalogue of ways to go wrong. None of these are exotic; they show up in real incident reports.

Large-payload denial of service. A DOM parser allocates memory in proportion to the document, so an attacker who can post a multi-gigabyte body, or a deeply nested array nested thousands of levels deep, can exhaust memory or blow the stack on the recursive descent. The defence is boring and effective: cap request body size, cap nesting depth, and cap the number of keys before you start parsing in earnest.
The JSON billion-laughs. XML had the classic entity-expansion bomb. JSON has no entities, but the same shape of attack returns through schema features and through clients that resolve $ref or expand templates: a small input that expands into a huge in-memory structure. Treat any expansion step as a place to bound the output, not just the input.
Prototype pollution. In JavaScript, naively merging an attacker-controlled object can let keys like __proto__ or constructor.prototype write onto the base object that every other object inherits from, changing behaviour across the whole program. The fix is to use null-prototype maps, reject those keys explicitly, or use a merge utility that already does. This one is specific to JavaScript's object model, which is exactly where most JSON gets parsed.
Injection on the way out. JSON is safe to parse; it is dangerous to build by hand. Concatenating user input into a JSON string instead of serialising it properly lets a quote or a brace break out of the value, the same class of bug as SQL injection. Always serialise with a real encoder, and when JSON is embedded inside HTML, escape the characters that could close a </script> tag.

Rule of thumb. Set limits before you parse — body size, depth, key count — and never build JSON with string concatenation. Most JSON security incidents are one of those two missing.

When text costs too much: the binary cousins

At high volume the text encoding stops being free. Repeated field names, base64 for binary, and the work of turning ASCII digits back into numbers all add up, and a binary format that carries the same data model can cut both the bytes and the parse time. There are two flavours worth knowing.

The first keeps JSON's self-describing property — every field still carries its name on the wire — but packs everything in binary. CBOR (RFC 8949) is the IETF's version, used in WebAuthn, COSE-signed messages, and constrained IoT devices. MessagePack is an older format with a similar wire size, common in Redis tooling and the Ruby and Python worlds. BSON is MongoDB's variant, less compact but richer in types — it adds ObjectId, dates, and binary as first-class. All three are faster to parse than text and avoid base64, but because they still ship field names, they are not as small as a schema-driven format can be.

The second flavour drops the field names entirely and relies on a shared schema to know what each byte means. That is Protocol Buffers, and it is where the real size win comes from: with the schema agreed ahead of time, the wire carries field numbers instead of field names and packs integers compactly, so a record that is a few hundred bytes of JSON can be a few dozen bytes of Protobuf. The cost is that you cannot read the bytes without the schema, and you give up JSON's open-it-in-an-editor debuggability.

Illustrative, not a benchmark. Self-describing binary formats shave off the text overhead; schema-driven Protobuf wins big by dropping field names from the wire.

The numbers above are illustrative — real ratios depend heavily on your data — but the shape is reliable. If you want to see it move with real payloads, the JSON vs Protobuf simulator lets you paste a record and watch the byte counts. The decision rule is simple: stay on JSON until size or parse cost is a measured problem, then reach for a self-describing binary format if you still want debuggability, or for Protobuf if you can accept a schema in exchange for the smallest wire.

JSON in HTTP APIs

Most JSON you meet rides on HTTP, and a few conventions are worth getting right. The Content-Type for a JSON body is application/json, and sending the wrong type — or none — is a common reason a server rejects an otherwise valid request. There is no charset parameter; JSON is UTF-8 by spec, so application/json; charset=utf-8 is redundant though harmless.

For RPC over HTTP without the machinery of gRPC, JSON-RPC 2.0 is the small option: a request is a JSON object with a method, params, and id, and a response carries either result or error. It powers Ethereum and Bitcoin nodes and the Language Server Protocol, and its appeal is that any language can speak it with no code generator. The cost is everything an interface definition language would give you: no service to discover, no typed client, no streaming.

// → request
{ "jsonrpc": "2.0", "method": "charges.create",
  "params": { "amount_cents": 4200, "currency": "USD" }, "id": 1 }

// ← response
{ "jsonrpc": "2.0", "result": { "id": "ch_001", "status": "pending" }, "id": 1 }

Two more habits pay off. Send a consistent error shape — a stable code, a human message, and optionally a field path — so clients can branch on errors without string-matching. And decide early whether unknown fields are tolerated, because that one choice governs how painful it will be to evolve the API later. JSON makes adding a field free; it makes removing one and tightening validation the hard part.

What this buys you as an engineer

Holding the trade-offs straight makes JSON predictable. It is the right default for public APIs, browser-facing endpoints, config files, and logs — anywhere a human will read the bytes and the volume is moderate, which is most services in most organisations. Reach past it when you have a measured problem: high-throughput service-to-service traffic where the wire size and parse cost show up in profiles, numbers that exceed 2⁵³, or streams of millions of records where even NDJSON's per-line parse starts to dominate. Until then, the readability and ubiquity are worth more than the bytes you would save, and a format you can debug with curl and jq removes a whole tier of tooling from your worst nights.

JSON as a wire format

What JSON actually is

The data model, in full

Why it won

The costs that come with text

How a parser reads it

Streaming many records: NDJSON

Giving JSON a contract: JSON Schema

Where JSON gets attacked

When text costs too much: the binary cousins

JSON in HTTP APIs

What this buys you as an engineer

Further reading