Lex · parse · lower · codegen.
Compilers don\'t turn source into machine code in one leap. They lower it through stages, each of which makes the input a little more abstract or a little more concrete than the one before. Watch x = (a + b) * 3; become 4 x86 instructions in 5 well-defined steps.
You write `x = (a + b) * 3;`. It's a string of 17 characters. The machine has no idea what any of it means yet. Five separate passes are about to turn it into 4 instructions.
- Compiler frontend
- The part that takes language-specific source and produces a language-neutral intermediate representation. Different languages have different frontends targeting the same backend (LLVM's model).
Where optimisation happens
Almost all of it on the IR. Constant folding sees 3 * 4 in IR and rewrites it as 12. Dead code elimination removes assignments whose results aren\'t used. Common subexpression elimination spots two copies of a + b and computes it once. Loop-invariant code motion hoists computations out of loops. None of these care about the source language — they work on the IR abstraction. That\'s why LLVM is used by Rust, Swift, Julia, and many others: same optimiser, many frontends.
Why register allocation is the hard part
You have unlimited IR temporaries (t1, t2…) and a fixed 16 physical registers. Mapping the first to the second is graph coloring: build a graph where temporaries that are alive at the same time are connected, then color it with k colors (k = number of registers). When that\'s impossible (graph isn\'t k-colorable), spill some temporaries to the stack. Heuristics pick which to spill. Modern compilers use linear scan or graph coloring with iterated register coalescing.
Modern wrinkles
Languages with type inference (Rust, Haskell, OCaml) interleave a type-checking pass between parsing and IR generation. Languages with macros (Lisp family, Rust, even C) expand macros before or during parsing — different timing has different semantic implications. JIT compilers run the whole pipeline at runtime, often deferring codegen until they\'ve profiled which paths are hot. WebAssembly is essentially a portable IR designed to be JIT-compiled by browsers.
Compilers and languages →
LLVM IR in depth, parser generators, SSA form, the math behind register allocation, JIT design patterns.
Open the Codex →