17 · 6 stages
Visualize / 17

How code becomes machine code.

Six distinct tools, five intermediate files, and a kernel syscall stand between the C you wrote and a running process. Each tool does one specific job. Watch a single main.c walk through preprocess, compile, assemble, link, and load.


stage 1 / 6 · —
STAGE 1SourceOUTPUTmain.c~150 BYou wrote this. STAGE 2PreprocesscppOUTPUTmain.i~30 KBExpand #includes, macros, conditionals. STAGE 3Compilecc1OUTPUTmain.s~2 KBSource → architecture assembly. STAGE 4AssembleasOUTPUTmain.o~3 KBAssembly → machine code + relocations. STAGE 5LinkldOUTPUTa.out~16 KBResolve symbols, lay out segments, write ELF. STAGE 6ExecuteloaderOUTPUTrunningOS maps segments, jumps to _start.
You write main.c

A handful of lines of C: an #include, a printf, a return 0. The source file is plain text — nothing about it is runnable yet. Everything you see below this point is a transformation by some specific tool, producing a specific file.

Translation units
Each .c file is one translation unit. The compiler processes them independently, then the linker stitches the resulting object files together.

Why the pipeline is split into six tools

Each tool is reusable. The assembler doesn\'t care if its input came from C, Rust, Fortran, or hand-written. The linker doesn\'t care what language produced each .o. This separation is why you can mix C, C++, Rust, and assembly in one binary — they all converge at the .o stage. It\'s also why language designers don\'t need to write a new linker; they just need a frontend that emits LLVM IR or assembly.

What you can do at each boundary

Stop at any stage to inspect: cpp main.c shows the post-preprocess source. gcc -S main.c emits assembly. gcc -c main.c stops at the object file. objdump -d main.o disassembles the machine code. readelf -a a.out dumps the final ELF structure. This step-by-step inspection is how compiler bugs get debugged, how exotic optimisations are verified, and how reverse engineers do their work.

Modern variations

Rust skips the .s file by default — the compiler emits LLVM IR and hands it to LLVM directly, which generates object code in memory. Go bypasses most of this with its own toolchain that does parse-through-link in a single binary. JIT languages (Java, JS) defer parts of this pipeline to runtime — the JVM\'s bytecode is roughly equivalent to .o, and the JIT does the assembly + link step on hot methods. WebAssembly compresses several stages into a single .wasm file that\'s halfway between bytecode and machine code.

Go deeper

Languages Codex →

LLVM IR internals, linker scripts, ELF format, dynamic loader behaviour, JIT design across V8 / HotSpot / .NET.

Open the Codex →
Found this useful?