JSON diff.
Structural comparison of two JSON documents — not just textual. Reports each added, removed, and changed path with the previous and new values, ignoring incidental key order. Useful for API regression checks, config drift, GraphQL response diffs.
| Op | Path | From | To |
|---|---|---|---|
| + | skills[2] | — | "Bernoulli numbers" |
| ~ | active | false | true |
| + | address | — | null |
Lines aren't structure.
Most engineers reach for diff -u or git diff the first time they need to compare two JSON documents, and the result is almost always disappointing. Classical text diff algorithms — the Myers diff that powers Git since 2005, the patience diff Bram Cohen popularized in 2009, the Hunt-McIlroy LCS algorithm from 1976 — all share an assumption that breaks on structured data: they treat the document as an ordered sequence of lines. JSON has no such constraint. The objects {"a":1,"b":2} and {"b":2,"a":1} are semantically identical under RFC 8259, but a line-oriented diff will happily report two changed lines.
Pretty-printing a document with two-space indentation versus four-space indentation produces a diff where every single line has changed. Re-serializing through a parser that sorts keys alphabetically produces what looks like a complete rewrite. None of these are real changes; they are artifacts of representation.
Structural diff reframes the problem. Instead of comparing byte sequences, it parses both documents into their abstract syntax — objects, arrays, scalars — and compares the resulting trees. Two documents are equal if and only if they have the same shape and the same leaf values at corresponding paths. This is harder to implement, because trees do not admit a single canonical linearization, and because arrays introduce an ordering question that text diffs answer trivially. The payoff is correctness: a structural diff reports exactly the semantic changes a downstream consumer would observe, with zero false positives from formatting drift.
A programming language for document edits.
RFC 6902, published by the IETF in April 2013, defines JSON Patch as a sequence of operations expressed as a JSON array. Each operation is itself an object with an op field naming one of six verbs — add, remove, replace, move, copy, and test — plus a path field holding a JSON Pointer (RFC 6901) that locates the target node within the document. Operations execute in order; if any operation fails, the entire patch is rolled back and the document is left untouched. This atomicity is what makes 6902 suitable as a wire format for state transitions.
A worked example. Suppose the source document is a user object with name "Ada" and a single role; the target adds a second role and bumps a version. A valid 6902 patch is an array of three ops: a replace on the name path, an add with the array-append pointer trailing dash, and another replace for the version. Notice that the patch is ambiguous in general — there are infinitely many patches that produce the same result — and minimizing patch size is a separate optimization concern from correctness.
The test operation is the feature that raises 6902 from a transport encoding to a concurrency primitive. A test op asserts that the value at a given path equals an expected value, failing the entire patch if it does not. This lets a client express "apply this update only if the document is still in the state I last saw," which is compare-and-swap semantics over arbitrary JSON. The pattern is invaluable for audit logs, conflict resolution in offline-first sync, and cross-system replication where a source of truth needs to detect divergence before stamping out a follower. The 6902 IANA media type is application/json-patch+json.
A sketch of desired state.
Where 6902 is precise and verbose, RFC 7396 — published October 2014 — is ergonomic and lossy. A merge patch is itself a JSON document, structurally similar to the target. The algorithm is recursive and trivially simple: for each key in the patch, if the value is null, delete that key from the target; if the value is an object, recurse; otherwise, overwrite. Keys present in the target but absent from the patch are left untouched. This means a merge patch reads almost like a partial document: to update a user's email and clear their phone, you send the email field with the new value and the phone field set to null.
The cost of that simplicity is the null-deletion conflation. There is no way to set a field to literal JSON null using merge patch, because null is reserved as the deletion sentinel. APIs that need to distinguish "field is intentionally null" from "field should be removed" cannot use 7396 verbatim; they must either escape via a sentinel object, fall back to 6902, or constrain the schema so that null is never a legal value. Similarly, merge patch cannot express array element insertion or removal — an array in a merge patch wholesale replaces the array in the target.
| Aspect | RFC 6902 (JSON Patch) | RFC 7396 (Merge Patch) |
|---|---|---|
| Format | array of operation objects | document mirroring target |
| Atomicity | all-or-nothing rollback | best-effort recursive merge |
| Array editing | insert / remove / move at index | whole-array replace only |
| Set-to-null | yes via replace op | no (null means delete) |
| Concurrency | test op for CAS | none built-in |
| Media type | application/json-patch+json | application/merge-patch+json |
6902 is a programming language for document edits; 7396 is a sketch of a desired state. Use 6902 when you need to encode exact transitions, audit them, or replay them. Use 7396 when humans will write patches by hand or when an API needs a low-friction PATCH surface and the target schema can tolerate the null limitation.
Four strategies, none universal.
Arrays are where every JSON diff implementation reveals its philosophy. The same two arrays can yield wildly different diffs depending on what the user means by "the same element." Three strategies dominate. The first is index-based diffing, where position is identity: position 0 in the source aligns with position 0 in the target, and any change at that position is a single edit. This is what naive structural diff does and it produces minimal patches when arrays are append-only or rarely reordered.
The second is LCS-based diffing — Myers, patience, or histogram diff — which finds the longest common subsequence and reports inserts and deletes around it. This is the right choice for an ordered list of comments, log entries, or any sequence where order is meaningful and elements move around. The third is set-based diffing, which ignores order entirely and reports added and removed elements; it is correct for tag lists, role assignments, and other unordered collections that JSON has no native type for.
A fourth strategy, key-based or identity-keyed diffing, splits the difference: the consumer declares "elements of this array are identified by the id field," and the differ matches elements across versions by id, reporting moves, in-place updates, additions, and deletions independently. This is what fast-json-patch (npm, widely used since 2014) supports via custom comparators, what the Python jsondiff library exposes through its syntax parameter, and what most production systems eventually evolve toward. A list of order line items keyed by SKU, a list of users keyed by uuid, a list of routes keyed by path — these all want key diff, and forcing them through LCS produces patches that are technically correct but semantically opaque (a reorder shows up as a delete-and-insert pair instead of a move).
The complexity story matters too. Sorted-key object diff is O(n+m) by merging two sorted streams. LCS array diff is O(nm) in the worst case, with constant-factor improvements from Myers' O(nd) variant where d is the edit distance. Set diff is O(n+m) with hashing. Key diff is O(n+m) with hashing once you have the id extractor. For documents with arrays in the millions of elements, the choice of array strategy dominates total runtime.
When two parties both edit.
Two-way diff tells you what changed between A and B. Three-way merge tells you how to reconcile changes when two parties (B and C) both modify a common ancestor (A). Git solved this for text in 2005 with a recursive merge driver that runs LCS on each pair, classifies each hunk as "only B changed," "only C changed," or "both changed," and flags the third case as a conflict. Adapting this to JSON is straightforward in spirit: compute the structural diff from A to B and from A to C, walk both diff trees in lockstep, and for any path that appears in both, either the changes agree (auto-merge) or they disagree (conflict).
The corners are where it falls apart. If B inserts an element at array index 2 and C inserts a different element at array index 2, what is the merged array? Both? In what order? If B changes a field to a string and C changes the same field to an object, there is no semantic merge — only a conflict. If B deletes an entire subtree and C modifies a leaf inside that subtree, is that a conflict (Git says yes) or does B win (some merge tools say yes)?
Mergify, jsonmergepatch, and various in-house tools at large companies each pick a policy, and the policies disagree. Generic JSON three-way merge with no schema awareness has unsolved corners that schema-aware merge — where the schema declares which arrays are sets, which are keyed, and which are ordered — can resolve cleanly. This is a major reason tools like Kubernetes' strategic merge patch exist: they bolt schema metadata onto 7396 to disambiguate exactly these cases.
Where you'll actually use it.
In real systems, JSON diff shows up in places you might not expect. Postgres has no built-in jsonb_diff as of version 16, but the third-party jsonb_diff_val extension and various user-defined functions fill the gap; teams often denormalize a diff into a separate audit table for query performance. DynamoDB has no diff primitive at all, but the UpdateItem expression language — SET, REMOVE, ADD, DELETE — is essentially a typed JSON Patch dialect, and any client that wants to issue partial updates ends up generating expressions from a structural diff against a cached previous state.
Stripe's webhook payloads include a previous_attributes object on updated events that is exactly an inverted merge patch: the keys present are the keys that changed, the values are the prior values, and reconstructing the diff is a one-liner. React's reconciliation algorithm is a cousin of JSON diff that solves a different problem under different constraints — it diffs virtual DOM trees to compute a minimal sequence of real DOM mutations, using the key prop as the explicit identity hint that JSON Patch lacks.
The build-versus-buy question turns on three factors: array strategy, patch format, and performance. If your data has natural keys and you need 6902 output, fast-json-patch (npm) or jsonpatch (PyPI) gets you ninety percent of the way; the last ten percent is custom array comparators. If you need set-aware or schema-aware diffing, you are probably writing it yourself or adopting a schema-aware tool like Kubernetes' strategic merge. Performance budgets matter: O(n+m) is achievable for object diff with sorted keys and for keyed array diff with hashing, but O(nm) is unavoidable for unkeyed ordered arrays.
RFC 6902 vs RFC 7396 — pick your patch.
Two standards turn JSON differences into transferable operations. JSON Patch (RFC 6902) is an array of explicit ops — add / remove / replace / move / copy / test — addressing values by JSON Pointer paths. It's verbose but unambiguous and supports tests for optimistic-concurrency. JSON Merge Patch (RFC 7396) is much simpler: a JSON document where present keys overwrite, missing keys leave alone, and explicit nulls delete. The catch is you can't represent "set this field to null" — null means "delete." Pick Patch for safety-critical flows, Merge Patch for ergonomic config diffs.
HTTP PATCH doesn't define which patch format you send. Set Content-Type: application/json-patch+json for RFC 6902, application/merge-patch+json for RFC 7396. Servers reject the wrong type with 415 Unsupported Media Type.