9 min read · Guide · Runtime
How it works · Runtime · Memory

Garbage collection, reclaiming what your program stopped using.

Find every object the program might still want; free everything else. Three algorithm shapes, two cost dimensions, one ancient idea — and yet still the most-debated subsystem in any runtime.

Parts01–08 InteractiveMark + sweep PrereqHeap / pointers

What is garbage collection?

Find what's reachable, free the rest.

Garbage collection (GC) automatically reclaims memory the program can no longer reach. The classical algorithm — mark-and-sweep — was published by John McCarthy in 1960 for Lisp. Modern collectors (G1, ZGC, Shenandoah in the JVM; Go's concurrent GC; V8's Orinoco) layer on generational, concurrent, and compacting variants to bound pause time.

The garbage collector's entire job is to answer one question: which objects might the program still want? Anything reachable from a "root" — a stack variable, a global, a CPU register holding a pointer — is potentially live. Anything not reachable is garbage; its memory can go back to the free pool.

The algorithm flavours all do this; they differ in how they trace, when they pause, and how they organise survivors — all built on top of a hash table mapping addresses to object metadata. Three families, three trade-offs.


Watch mark and sweep run on a small heap

Mark everything reachable, then free the rest.

Below: a tiny heap of objects across three generations. Each block has random references to other blocks. Click a block to make it (or unmake it) a root. Run the collector — watch the mark phase find reachable blocks, then sweep the rest. Survivors move up a generation.

idle — click blocks to toggle root
Gen 0 · 0 live · 0 units
— empty —
Gen 1 · 0 live · 0 units
— empty —
Gen 2 · 0 live · 0 units
— empty —
live reachable (mark) garbage (sweep) ★ root

Stop-the-world GC: pausing the whole program

The simplest collectors freeze every thread.

The textbook algorithm. Mark: from each root, traverse references and mark every reachable object. Sweep: walk the heap; everything unmarked goes back to the free list. Both phases pause every mutator thread. Simple, correct, and devastating to latency on a heap of any size.

Modern collectors replace this naive scheme with various tricks — concurrent marking, incremental sweeping, region-based heaps — but the underlying logic is unchanged. Find reachable, reclaim the rest. A long mark phase shows up directly in the event loop as a frame budget overrun.


Generational GC: most objects die young

Collect the young generation far more often.

The most important empirical fact in garbage collection: most objects die young. A short-lived loop variable, a request-scoped buffer, a temporary string concatenation — these are the bulk of allocations, and almost all are unreachable within milliseconds.

Generational collectors exploit this. The young (Eden) generation is collected often, fast; the few survivors get promoted to an older generation; old generations are collected rarely. This lets the collector spend almost all its time on the small fraction of the heap where most garbage lives. Throughput improves by an order of magnitude — not unlike how a cache hot tier holds the working set, or how the CPU cache hierarchy exploits temporal locality. The young generation typically fits in L2; old generations live in DRAM and pay the TLB-miss tax during marking sweeps.


Concurrent GC: collecting without stopping

Mark alongside the running program.

Stopping the world is unacceptable for many workloads. Concurrent collectors run the mark phase in parallel with the application — but the application keeps mutating references while marking is in progress, which can create races. The trick is the tricolour invariant: black objects (already marked) must not point to white objects (not yet marked) without going through a grey one (queued for marking).

Modern concurrent collectors — Go's, ZGC, Shenandoah — use write barriers to maintain this invariant. Mutator stores intercept and either re-grey the target or shade the new pointer. The application sees a few extra instructions per pointer write; it never stops for marking. Pause times drop from hundreds of ms to single-digit ms.


Compacting GC: defragmenting the heap

Slide survivors together to reclaim free space.

Mark-and-sweep leaves the heap fragmented — free space scattered between live objects. Allocation has to find a fitting hole; over time large allocations may fail despite plenty of total free space. Compacting collectors fix this by moving live objects together and updating references.

Two flavours: copying collectors (Cheney) split the heap in two and move all live objects from one half to the other on each collection; mark-compact moves live objects in-place to one end. Both produce a contiguous free region. The cost is that updating references means walking the entire reachable graph again — and any non-relocatable references (JNI handles, foreign pointers) must be pinned.


How JVM, Go, V8, and .NET collect garbage

The same ideas, tuned for different goals.

JVM

G1 / ZGC / Shenandoah.

G1 (Garbage First) is the JDK 9+ default — generational, regional, partially concurrent. ZGC and Shenandoah aim for sub-10ms pauses on heaps in the terabytes. Hotspot is now mostly all of these, configurable.

Go

Concurrent, non-generational.

Mark phase is concurrent; sweep is concurrent and lazy. Sub-millisecond pauses. Not generational by design — the team chose to optimise for low pause time over allocation throughput.

V8

Generational, incremental.

Two-generation Cheney for young-gen, mark-compact for old-gen, both with concurrent and incremental phases. The Chrome tab's GC is V8's GC.


Tune GC only after you measure

Premature GC tuning usually backfires.

GC tuning is a notoriously deep rabbit hole and almost always premature. The right starting point is always measure — log GC pauses, allocation rate, retained set size — and then identify which knob to turn.

Common observations: pause times too long? Switch to a low-pause collector (ZGC, Shenandoah). Allocation rate too high? Reduce per-request churn (object pools, ring buffers, fewer string concatenations). Old-gen filling too fast? Likely a memory leak — find the retained set growing across collections.


What GC pauses actually look like at scale

What to expect at scale.

Java G1GC · default 11+
Concurrent, generational. Tunable target pause via -XX:MaxGCPauseMillis=200. Real pauses on 16-GB heap: ~100-300ms for full collections, <50ms for young-gen. Default for OpenJDK and most Spring services.
Java ZGC · production since JDK 15 (2020)
Pauseless. Sub-millisecond pauses regardless of heap size — Netflix runs ZGC on 100+ GB heaps with p99 GC pauses under 1ms. Higher CPU overhead than G1 (~5-10% more). The default for any latency-sensitive Java service in 2024.
Go GC · concurrent tri-color mark-sweep
Pauses ~100-500µs regardless of heap size, since Go 1.5 (2015). Tradeoffs: ~25% CPU overhead during collection, no compaction (so heap fragmentation is real but bounded). Twitch and Cloudflare have published GC profiles; both report sub-1ms pauses on multi-GB heaps.
V8 (Node.js, Chrome)
Generational, concurrent marking ("Orinoco"). Young-gen pauses ~1-5ms; old-gen full pauses 50-200ms on a 1-GB heap. The reason Node's old "single thread" warning bites — a long GC pause stalls the event loop directly.
.NET CLR GC · server GC
Three generations + LOH. Server GC mode (multi-thread) pauses ~30-100ms on a 4-GB heap. .NET 5+ added "concurrent server GC" reducing pauses to ~10ms. Workstation GC is for desktop apps and is single-threaded.
CPython · reference counting + cycle collector
Reference counting is immediate (no pauses for non-cyclic garbage). The cycle collector runs periodically and can pause tens of ms on large object graphs. The GIL makes CPython's GC trivial to reason about; the cost is single-threaded execution.

When garbage collection took the site down

Real outages, real lessons.

LinkedIn — JVM GC pauses caused 100ms tail latency. 2014, public engineering blog: a single full GC on a 32-GB heap stalled the whole node for ~3 seconds, blowing p99 SLOs. Fix: switched to G1GC, tuned region size, eventually moved latency-sensitive services to ZGC after 2020.

Discord — went from Go to Rust partly for GC. Discord's 2020 blog post "Why Discord is switching from Go to Rust" documents GC-induced 100ms+ tail latencies on the Read States service. Rust's no-GC ownership model brought p99 from 100ms to 5ms. The point: at extreme scale, even Go's 1ms-class GC can be too much.

Twitter — Scala GC tuning as a discipline. Twitter's Finagle stack famously published GC tuning playbooks; their "Scala school" articulated the discipline of writing low-allocation Scala (avoiding closures, reusing buffers, preferring arrays over Lists). Twitter's old timeline service ran on tuned G1GC with under 50ms p999 pauses on 24-GB heaps.

Netflix — ZGC adoption. Netflix moved from G1GC to ZGC in 2020-21 across most services. Public KubeCon talks: ZGC dropped p99 GC pauses from ~50ms to ~1ms; the additional 5-10% CPU was worth it for the latency win. Currently runs ZGC on heap sizes ranging from 4 GB to over 100 GB.

Pattern across all four: GC is invisible until tail latency matters. If your service has p99 SLOs under ~50ms, GC choice and tuning are first-class concerns; below that latency budget, it's usually background noise.



A closing note

Garbage collection is one of those topics that's simple at the conceptual level and bottomless at the implementation level. Find what's reachable, free the rest. The decade of work that goes into making that fast on terabyte heaps is what separates a well-loved runtime from a tolerated one.

Found this useful?