Tool

JSON validate.

Strict RFC 8259 JSON parsing with line + column error locations. Inspects the parsed value's depth, total keys, and array count. Trailing commas, single quotes, and unquoted keys are not legal JSON — this validator will say so.

Status
valid
Bytes
134
Depth · keys · arrays
3 · 5 · 1

JSON input
Samples
Pretty-printed (2-space indent)
{
  "name": "Ada Lovelace",
  "born": 1815,
  "skills": [
    "analytical engine",
    "first-program"
  ],
  "active": false,
  "address": null
}
Minified (112 chars)
{"name":"Ada Lovelace","born":1815,"skills":["analytical engine","first-program"],"active":false,"address":null}

From a 2002 webpage to RFC 8259.

JSON began as an artifact of practice rather than design. Douglas Crockford extracted it from JavaScript object literal syntax around 2001 and formalized it in 2002 at json.org. The first IETF document, RFC 4627, appeared in July 2006 and ran to roughly fifteen pages — a remarkably short specification for something that would underpin the bulk of internet data exchange a decade later. RFC 4627 was eventually deemed too permissive in places and too ambiguous in others, particularly around top-level scalars, duplicate keys, and Unicode handling.

The I-JSON ("Internet JSON") working group produced RFC 7159 in March 2014 to address interoperability gaps, and RFC 8259 superseded it in December 2017. RFC 8259 is the current normative reference, and it ratifies a few real-world behaviors: the document MUST be UTF-8 when exchanged between systems that are not part of a closed ecosystem, top-level values of any type are permitted, and duplicate keys produce undefined behavior that implementations are not required to reject. ECMA-404, first published in 2013 and revised in 2017, captures the syntactic grammar without the interoperability guidance, so RFC 8259 and ECMA-404 together describe what a conforming parser must do.

JSON is routinely called "the de facto data interchange format" because no contemporary serialization scheme — XML, MessagePack, Protocol Buffers, CBOR, Avro — has displaced it for text-oriented APIs, configuration, logs, or browser-to-server traffic. The reasons are historical and ergonomic: every browser parses it natively, every language ships a parser in its standard library, and the grammar fits on a single page. The cost of that simplicity is the absence of features developers periodically request, the most famous being comments. Crockford's stated reasoning — that he removed comments because he had observed people using them to hold parsing directives, which broke interoperability — is sometimes mocked, but it reflects a defensible philosophy: a data interchange format should not be a configuration language.

The grammar in detail.

A strict RFC 8259 parser rejects a surprising number of inputs that look reasonable. Keys must be double-quoted strings; single quotes and bare identifiers are syntax errors. Trailing commas in objects and arrays are forbidden. Comments — both line and block forms — are absent from the grammar at every position. The literals NaN, Infinity, and -Infinity are not numbers in JSON, despite being IEEE 754 values; encoders that need to round-trip those must agree out-of-band on a sentinel string or null. String escape sequences are limited to \", \\, \/, \b, \f, \n, \r, \t, and \uXXXX. Anything else is a parse error.

Unicode handling is where strictness gets subtle. JSON text is UTF-8 by default in RFC 8259, and a leading byte-order mark (BOM, U+FEFF) is not allowed; conforming parsers MAY choose to ignore it, but a strict validator will flag it. Characters outside the Basic Multilingual Plane must be encoded either as raw UTF-8 bytes or as a UTF-16 surrogate pair using two \u escapes — for example, the emoji U+1F600 becomes \uD83D\uDE00. A lone high or low surrogate is technically a structural error, though many parsers accept it and emit replacement characters.

JSON5, JSONC, HJSON exist for a reason

The proliferation of relaxed dialects exists because configuration files have different requirements than wire formats. JSON5 permits comments, trailing commas, single-quoted strings, unquoted keys that are valid ECMAScript identifiers, hex numbers, and explicit Infinity/NaN. JSONC adds only comments. These dialects are fine for human-edited config but should never be served from an API endpoint, because a downstream client running a strict parser will reject them with an opaque error.

JSON has no integer type.

The JSON grammar describes numbers as unbounded signed decimals with optional fraction and exponent. Read literally, 9007199254740993 and 0.1 and 1e400 are all valid JSON. In practice, essentially every implementation parses numbers into IEEE 754 binary64 — the same double JavaScript exposes — and quietly loses precision past 2^53. The canonical war story is the Twitter ID problem: when Twitter migrated to 64-bit Snowflake IDs in 2010, JavaScript clients that called JSON.parse on the response received tweet IDs rounded to the nearest representable double. Twitter's solution was to add a parallel id_str field, and that pattern — emit large integers as strings — has become standard practice for any 64-bit identifier exposed over a JSON API.

The deeper problem is that JSON has no integer type at all. The spec says only "number." Languages with richer numeric towers cope by offering escape hatches: Go's encoding/json provides json.Number, which preserves the original string and defers conversion; Python's json module accepts a parse_int and parse_float hook so callers can route to decimal.Decimal; Java's Jackson can be configured with USE_BIG_DECIMAL_FOR_FLOATS; .NET's System.Text.Json exposes JsonElement.GetRawText().

LanguageDefault number typeLossless escape hatch
Gofloat64json.Number (string-preserving)
Pythonint / floatparse_int=Decimal hook
JavaScriptNumber (binary64)JSON.parse reviver + BigInt
JavaDouble / LongUSE_BIG_DECIMAL_FOR_FLOATS
Rust serdei64 / u64 / f64arbitrary_precision feature flag
A useful API design rule

Any integer that could exceed 2^53 — roughly 9.007 quadrillion — should be serialized as a string. This includes Snowflake IDs, ULIDs encoded as numbers, monotonic counters in high-volume systems, and Unix nanosecond timestamps. The cost is one round of parseInt on the consumer; the alternative is silent data corruption that surfaces years later when an ID first crosses the threshold.

Tree-loading is a liability at scale.

The default API in every standard library — JSON.parse, json.loads, json.Unmarshal, ObjectMapper.readValue — loads the entire document into memory and constructs a tree. For most payloads this is fine. For multi-gigabyte exports, log archives, or untrusted input it is a liability. A 500 MB JSON document parsed into Python dicts can balloon to several gigabytes of resident memory because every key becomes an interned string and every container carries object overhead. Worse, deeply nested structures can blow the parser's call stack: a recursive-descent parser handed a few thousand opening brackets will exhaust stack space, which is why most production parsers cap recursion depth somewhere between 64 and 1024 levels. RapidJSON defaults to 1000, Jackson to 1000, Go's encoding/json to 10000.

Streaming parsers solve both problems by emitting events — start-object, key, value, end-array — without building a tree. Lloyd Hilaiel's yajl (2007) is the C-language reference for this style; ijson is the Python equivalent; Jackson's JsonParser exposes nextToken() directly; jq has a --stream mode that flattens documents into path/value pairs you can filter incrementally. For pipeline-oriented workloads, NDJSON (newline-delimited JSON, sometimes called JSON Lines) is the dominant convention: each line is an independent JSON document, parsers can be reset cheaply between records, and the format is trivially splittable across map-reduce workers. Most modern log shippers — Fluent Bit, Vector, Filebeat — emit NDJSON by default.

simdjson and the SIMD revolution.

For decades the conventional wisdom was that JSON parsing was bottlenecked by I/O, not CPU. That changed in 2019 when Geoff Langdale and Daniel Lemire published simdjson, which sustains 2–4 GB/s on a single core by using SIMD instructions (SSE4.2, AVX2, NEON) to validate UTF-8, locate structural characters, and identify string boundaries in parallel across 64-byte vectors. The architectural insight is that most of JSON parsing is branch-heavy character classification, and branches are exactly what modern superscalar CPUs handle worst; replacing them with branchless bitmask operations turns parsing into a streaming arithmetic kernel. Their two-stage design — stage 1 produces an index of structural characters, stage 2 walks the index to build the document — separates the SIMD-friendly work from the inherently serial tree construction.

Other contenders occupy adjacent niches. RapidJSON (Tencent, 2011) was the standard high-performance C++ parser before simdjson and remains popular for its DOM/SAX flexibility. yyjson (2020) is a single-header C library that beats RapidJSON on most benchmarks while preserving full standards conformance and remaining easy to embed. sonic (ByteDance, 2021) brings JIT-compiled JSON parsing to Go and Java. Despite all of this, the overwhelming majority of JSON in production still flows through stdlib parsers — encoding/json, JSON.parse, json.loads — that are ten to fifty times slower than simdjson. The reasons are pragmatic: standard library parsers ship everywhere, produce excellent error messages with line and column information, and integrate with reflection-based marshaling.

ParserThroughputNiche
simdjson 3.x2–4 GB/sbulk ingestion
yyjson 0.101–2 GB/sembedded, single-header
RapidJSON 1.10.5–1 GB/sgeneral-purpose C++
Go encoding/json100–200 MB/sstdlib correctness
Python json (C)150–300 MB/sstdlib convenience
JSON.parse (V8)400–800 MB/sbrowser default

Where the parser stops, the schema starts.

Where the JSON spec stops at grammar, JSON Schema picks up at semantics. The drafts have evolved through Draft 4 (2013), Draft 6 (2017), Draft 7 (2018), 2019-09, and 2020-12, with the version identifier moving from a draft slug to a calendar release as the working group sought stability. The core vocabulary is small: type constrains the value's runtime type, properties describes object members, required lists mandatory keys, additionalProperties toggles whether unknown keys are permitted, patternProperties matches keys by regex, and the combinators anyOf, oneOf, and allOf compose subschemas. References via $ref allow recursive and modular schemas.

The reference validator in production is AJV (Another JSON Validator, by Evgeny Poberezkin), which compiles a schema into a specialized JavaScript function and routinely validates simple payloads in well under 100 nanoseconds. That compilation step is what makes schema validation fast enough to put on every API request. JSON Schema also lives inside OpenAPI 3.x as a constrained dialect — OpenAPI 3.0 used a Schema Object that diverged from JSON Schema Draft 5, and OpenAPI 3.1 (2021) realigned with JSON Schema 2020-12 to end the long-standing incompatibility.

Two recurring debates deserve mention. The format keyword is officially an annotation, not an assertion, meaning conforming validators are not required to reject malformed values. AJV requires opting in via the ajv-formats package, and other validators behave differently, which makes format a portability hazard. The second debate concerns defaults and coercion: schemas often include a default keyword and developers expect validators to inject those defaults during validation, but the spec is explicit that default is annotation-only.

JSON's three famous holes.

Numbers are 64-bit floats. Anything beyond Number.MAX_SAFE_INTEGER (253−1) loses precision silently. APIs that return Twitter-scale 64-bit IDs return them as strings for this reason. Strings are sequences of UTF-16 code units, not Unicode scalar values — surrogate pairs need to be paired correctly to round-trip. Object key order is technically unspecified by RFC 8259, but every implementation that matters preserves insertion order, and many APIs depend on it (canonical JSON for signing, for example).

Streaming JSON exists

JSON.parse reads the whole document into memory. For large feeds, use SAX-style parsers (Yajl, ijson) or NDJSON / JSON Lines — one object per line, parsed independently. Most logging and ML pipelines use NDJSON for exactly this reason.

Found this useful?