Tool

Text diff.

Side-by-side comparison of two strings using the classic longest-common-subsequence algorithm. Diff at line, word, or character granularity; ignore whitespace or case. All local — useful for log diffs, prompt revisions, configuration changes you don't want to paste into a public diff tool.

+ inserts

− deletes

Similarity

33%

ignore whitespace ignore case

A — original

B — revised

Diff

Roses are red,
+ Violets are pink,
+ Honey is sweet,
- Violets are blue,
- Sugar is sweet,
And so are you.

Hunt-McIlroy, Myers, patience.

The line-oriented diff most engineers use daily descends from a 1976 paper by James Hunt and Doug McIlroy at Bell Labs, written to support the original Unix diff(1). Their algorithm reduced file comparison to the Longest Common Subsequence problem: find the longest sequence of lines that appears in both files in the same relative order, then everything outside that sequence is either an insertion or a deletion. The Hunt-McIlroy approach was elegant but expensive, with worst-case behaviour around O(n·m) in time and memory, which became painful as source files grew past a few thousand lines.

Eugene Myers published "An O(ND) Difference Algorithm and Its Variations" in 1986, and that paper is still the bedrock of modern diff tooling. Myers reframed the problem as a shortest-path search through an edit graph, where D is the size of the actual edit script rather than the file size. For two files that are mostly similar — the common case in version control — D is small and the algorithm runs in nearly linear time. Git's default diff engine, GNU diff, and most IDE diff views are Myers variants. The algorithm is fast, deterministic, and produces minimal edit scripts, but "minimal" and "readable" are not the same thing.

Bram Cohen, the author of BitTorrent, introduced patience diff around 2009 while working on Bazaar. Patience diff is built on a different intuition: instead of finding the longest common subsequence of all lines, it first identifies lines that appear exactly once in both files, anchors the diff on those unique lines, and recurses on the regions between them. Unique lines tend to be function signatures, imports, or distinctive expressions; repeated lines like closing braces are notoriously bad anchors. Histogram diff — Git's recommended algorithm for newly-created repositories with diff.algorithm set to histogram — is a refinement that uses an occurrence histogram to pick anchors faster than patience while preserving most of its readability gains.

Minimal isn't readable.

Plain LCS-based diff is correct in the formal sense and frequently nonsensical in the practical sense. The classic failure mode is insertion inside a repeated pattern. Imagine a file with three nearly identical if blocks, and a fourth one is added between the second and third. Myers will often produce a diff that shows the closing brace of block two as deleted, the new block as added in the middle, and the closing brace of block three as added at the end, because the algorithm has no preference for keeping logically related lines together when the cost is equal. The diff is minimal in edit distance but reads like vandalism.

Function-body reorders are similarly hostile to LCS. Move a helper function fifty lines up in the file, and a Myers diff shows fifty lines deleted from one location and fifty lines added at another, with no indication that this is a move rather than a rewrite. Indentation-only changes — converting tabs to spaces, wrapping a block in a new conditional — produce diffs where every line in the block appears as both deleted and re-added, drowning the actual change. The "shifted block" problem generalizes this: any time a block of code is offset by a small edit at its boundary, plain LCS struggles to align the unchanged interior.

Patience diff improves these cases because its anchors are unique lines, which tend to fall on meaningful boundaries. When you add a new function between two existing ones, the unique signature lines of the surrounding functions anchor the diff cleanly, and the new function appears as a single contiguous insertion. Histogram diff inherits this property and adds speed.

Try histogram diff

If you have ever stared at a Git diff that shows a closing brace deleted on line 47 and added on line 89 with seemingly random churn between them, you have met the shifted-block pathology of Myers diff. Run git config --global diff.algorithm histogram and re-view the same change. The diff will not be smaller in line count, but it will be substantially more legible.

Tree, structural, semantic.

Text diff treats source code as a sequence of lines, and that abstraction leaks the moment a refactor crosses line boundaries. Rename a variable across a file and a textual diff highlights every occurrence; an AST-aware diff highlights one declaration. Reformat a block with a different brace style and textual diff explodes; a tree diff shows nothing changed semantically. The mismatch between line-oriented diff and the structured nature of code is the reason refactor PRs are painful to review.

Tree diff operates on the parsed abstract syntax tree of each file. Tools like jscodeshift and ts-morph use AST manipulation for codemods, and the diff produced by such transformations is naturally expressed at the node level rather than the line level. Structural diff applies the same idea to data formats: comparing two JSON documents node-by-node tells you which keys changed, which arrays were reordered, and which values were updated, regardless of whitespace or key ordering.

Semantic diff goes further by combining tree diff with heuristics for matching nodes across versions even when their structure has changed. GumTree, originating from academic research in 2014, computes a tree edit script between two ASTs and is used as a backend for several research tools. Difftastic, written in Rust by Wilfred Hughes, brings this approach to a polished command-line tool that ships syntax-aware diffs for dozens of languages and integrates as a Git external diff driver. The trade-off is parser dependency: difftastic needs a tree-sitter grammar for the language, and degrades to text diff when one is not available. For code review of refactors and language migrations, the readability improvement justifies the setup.

Diff is two files. Merge is three.

Diff is a two-file problem; merge is a three-file problem, and conflating them is a common source of confusion. A two-way diff can tell you what changed between A and B, but it cannot tell you whether to keep a change when both sides have edited the same region, because there is no notion of which version is the baseline. Three-way merge introduces the common ancestor: given a base version O and two descendants A and B, the merge algorithm computes the diffs O→A and O→B, then attempts to apply both to O. Where the two diffs touch disjoint regions, the merge succeeds automatically. Where they overlap, you get a conflict.

Git internally supports several merge strategies. Resolve is the simplest three-way merge and works only on two heads. Recursive was the long-time default for two-head merges and handles the complication of multiple common ancestors by recursively merging the ancestors first to produce a virtual base. Ort, short for "Ostensibly Recursive's Twin," replaced recursive as the default in Git 2.34 (released November 2021) and is faster, more correct on edge cases involving renames, and uses less memory. Octopus handles merges of three or more branches when none of them conflict with each other.

The reason merge deserves separate treatment: the diff algorithm choice and the merge strategy choice are independent. You can use histogram diff for review and ort for merging; changing one does not affect the other. Merge also has to reason about content that diff does not — renames, copies, submodule pointers, and binary files with custom merge drivers.

Set diff3 conflict style

If you maintain a long-lived branch and dread the rebase, set git config merge.conflictStyle diff3 or the newer zdiff3. Both add the common ancestor's version between the conflict markers, which often makes the right resolution obvious without leaving your editor.

The renderer matters as much as the algorithm.

The algorithm produces an edit script; the viewer decides how to render it, and rendering choices change comprehension as much as algorithm choices do. Unified diff — the format Git prints by default — interleaves removed and added lines with - and + prefixes and is dense, scannable, and grep-friendly. Side-by-side diff, the format every web review tool defaults to, places old and new in parallel columns and is better for visual scanning of larger changes but wastes horizontal space on narrow screens. Inline highlighting marks changed runs within a line; block highlighting marks the whole line. Character-level highlighting within a changed line is the difference between knowing a line changed and knowing what changed.

Tool	Default layout	Intra-line highlight	Algorithm
GitHub	Unified or split	Word-level	Myers (Histogram opt-in)
GitLab	Inline or parallel	Word-level	Myers
Bitbucket	Side-by-side	Word-level	Myers
difftastic	Side-by-side	Syntax-node level	Tree diff
delta	Side-by-side	Character-level	Underlying Git algorithm
vimdiff	Side-by-side	Character-level	Myers or patience

Color choice matters more than people admit. Red and green are the conventional pair, but they fail for the roughly 8% of men with red-green color blindness, which is why GitHub added a colorblind-friendly theme in 2017 using blue and orange. Background-color highlighting reads faster than foreground; bold-on-dark themes need more contrast than light themes. Tools that get visualization right — delta with its line-numbered side-by-side view, difftastic with its syntactic awareness, vimdiff with its tight integration into the editor — treat the diff as a reading experience rather than a data dump.

Why git diff looks the way it does.

Git's default diff is "Myers' algorithm" — a refinement of the LCS approach that's faster on the typical case (small changes in large files). It works on lines and produces unified-diff hunks with three lines of context above and below each change. The --patience and --histogram alternatives change which common subsequence the algorithm prefers when there are several equally long ones; histogram is the modern default for new repos because it picks intuitively-aligned hunks (anchoring on rare lines first).

Three-way merge ≠ two-way diff

Merging two branches isn't just "diff and apply." Git uses three-way merge: it diffs your branch against the common ancestor and the other branch's diff against the same ancestor, then layers them. Conflicts are the regions where both diffs touched the same lines.

Found this useful?