Garbage collection
Go ships a concurrent, non-generational, non-moving, tri-color mark-and-sweep collector. It
runs alongside the mutator, pauses the world for two short windows per cycle, and decides
when to start by watching how fast the heap is growing. Most of the knobs that matter are
GOGC, GOMEMLIMIT, and the things you allocate in the first place.
What Go's GC is
Go's collector is concurrent, non-generational, non-moving, and uses tri-color mark-and-sweep. Concurrent means it does most of its work while application goroutines keep running. Non-generational means it walks the entire reachable heap on every cycle — there are no young/old generations. Non-moving means a live object stays at its allocated address; this keeps pointer arithmetic from cgo and unsafe code well-defined, and it removes the need to fix up references after collection.
A short history. Go 1.5 (2015) introduced the concurrent collector and ended the
stop-the-world pauses of the early days. Go 1.8 (2017) reworked the write barrier and
brought typical STW pauses below 100 µs. Go 1.19 (2022) added
GOMEMLIMIT, a soft memory ceiling that gives the pacer something to aim at
beyond pure heap-growth heuristics. The shape of the algorithm hasn't changed much
since; the work since then has been pacer refinements and allocator improvements.
The tri-color invariant
The collector colours every heap object one of three shades. White means not yet visited — at the end of marking, white objects are garbage. Grey means visited but its outgoing pointers haven't been scanned yet. Black means visited and fully scanned. Marking proceeds by picking a grey object, blackening it, and greying any white objects it points to. The cycle ends when there are no grey objects left.
The strong tri-color invariant is the rule that makes concurrent marking safe: no black object points to a white object. As long as that holds, the collector can sweep white objects without worrying that a black (already-scanned) object still has a live reference into them. The mutator, of course, runs concurrently and can rewrite pointers at any moment — which is exactly the job of the write barrier.
Write barriers
During marking, every pointer write goes through a small compiler-inserted hook. Go uses a Dijkstra-style insertion barrier: when the mutator writes a pointer into a heap slot, the runtime greys the new target. That way, even if the slot lives inside an already-black object, the target won't be missed.
The barrier costs around 7–10 ns per pointer write while a cycle is active, and is compiled out when no cycle is running. Since Go 1.8 the collector uses a hybrid barrier — insertion plus a small deletion component — that lets the runtime skip the stack rescan at mark termination. Before 1.8 the rescan was the dominant STW cost; eliminating it is what got pauses under 100 µs.
// Conceptually, every pointer write becomes:
// *slot = ptr
// is rewritten to:
// writeBarrier(slot, ptr)
//
// writeBarrier(slot, ptr):
// if gcphase == _GCmark {
// shade(ptr) // <a class="il" href="/simulators/dijkstra-pathfinding">Dijkstra</a> insertion
// shade(*slot) // deletion (hybrid, since 1.8)
// }
// *slot = ptrThe phases of a GC cycle
A cycle has four phases. Two are stop-the-world but tiny; two run concurrently with your goroutines.
| Phase | What happens | Stops the world? |
|---|---|---|
| Sweep termination | Finish any leftover sweep from last cycle, enable write barriers | Yes, tens of µs |
| Mark (concurrent) | Walk the heap from roots, grey-to-black, mutator runs with barrier on | No |
| Mark termination | Drain remaining work, disable write barriers, prepare for sweep | Yes, tens of µs |
| Sweep (concurrent) | Reclaim white spans lazily as the allocator asks for memory | No |
Total stop-the-world time is the sum of the two small windows — usually well under 100 µs on a healthy workload. Everything else overlaps with mutator execution. The cost the application pays during marking is the write barrier plus mark assist: a goroutine that allocates faster than the background marker can keep up is conscripted into doing a proportional share of marking work itself.
The pacer
The pacer is the part of the runtime that decides when to start a cycle. Its goal is simple: finish marking before the heap fills up to its target. If it starts too late, allocators have to do more mark assist and pauses can balloon. If it starts too early, you waste CPU collecting a heap that hadn't grown much.
By default the target is set by GOGC, which expresses heap growth as a
percentage of the live set after the previous cycle. GOGC=100 (the default)
means the next cycle will aim to complete by the time the heap has doubled. Since Go
1.19 the pacer also respects GOMEMLIMIT: a soft total-memory ceiling that
pulls cycles forward as the process approaches the limit, trading CPU for staying under
the cap.
The pacer uses a feedback loop over recent cycles to estimate marking rate versus allocation rate, then starts the next cycle at a heap size it predicts will let it finish on time. It usually converges quickly. Workloads with bursty allocation are the ones that give it trouble.
GOGC and GOMEMLIMIT
GOGC is the primary knob. Raising it (GOGC=200) lets the heap
grow further between cycles — fewer collections, more memory, lower CPU spent on GC.
Lowering it (GOGC=50) does the opposite: more frequent collections, smaller
heap, more CPU. Setting GOGC=off disables the collector entirely, which is
almost never what you want outside of short-lived batch jobs.
GOMEMLIMIT is the knob for containerised environments. It takes a byte
value (4GiB, 500MiB) and treats it as a soft total-memory
ceiling for the Go runtime — heap, stacks, runtime overhead. The pacer will run cycles
more aggressively as the process approaches it. It does not hard-cap memory; the
runtime will exceed it rather than crash, but it will spend a lot of CPU doing so.
GOMEMLIMIT, can be OOM-killed even with plenty of headroom — because
GOGC=100 will happily let the heap reach 2× the live set, and Go's view of
"the heap" doesn't know about the cgroup limit. Setting
GOMEMLIMIT=3500MiB gives the pacer a number to respect and usually prevents
the kill. Keep some margin below the cgroup limit for non-heap allocations and other
processes in the container.Reading gctrace
Setting GODEBUG=gctrace=1 in the environment prints a one-line summary on
stderr at the end of every cycle. The format is dense but worth learning to read.
gc 14 @2.345s 3%: 0.018+1.2+0.024 ms clock, 0.14+0.50/2.3/0.0+0.19 ms cpu, 120->128->64 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 PReading left to right: gc 14 is the cycle number. @2.345s is
seconds since program start. 3% is the share of CPU spent on GC since
start. 0.018+1.2+0.024 ms clock is wall time for sweep-termination + mark +
mark-termination — the first and third numbers are STW windows; the middle is
concurrent. The cpu figures break that down across all P's.
120->128->64 MB is heap size at cycle start, at cycle end, and
the live set after sweep. 130 MB goal is what the pacer was aiming
for. 8 P is the number of processors.
Numbers that suggest trouble: STW phases consistently above a millisecond (usually means
a very large stack or root set), mark assist showing up as a large slash-term
in the cpu breakdown (the pacer is behind — allocators are being made to help), live
set climbing cycle over cycle (a leak), or cycles happening every few hundred
milliseconds (allocation rate too high for the current GOGC).
Common pitfalls
- Allocations in hot loops. Every
make,appendthat grows, or string concatenation in a tight loop adds GC pressure.sync.Poolis the usual fix for fixed-size buffers — but only for objects with a clear reset story. - Interface conversions that escape. Passing a stack-allocated value
as an
interface{}usually forces it to the heap, because the interface header needs a stable pointer. Escape analysis catches some of these; many slip through. - Very large heaps without GOMEMLIMIT. In containers, the runtime has
no idea what your memory limit is. OOM kills with idle GC headroom are the result.
Set
GOMEMLIMIT. - Depending on finalizers.
runtime.SetFinalizeris not guaranteed to run, may run late, and won't run at all on program exit. Treat finalizers as a debugging aid, not a resource-management strategy. - Assuming runtime.GC() is a fix. An explicit
runtime.GC()call is useful for benchmarks and tests where you want a known starting state. In production it just adds a forced cycle on top of the ones the pacer was already going to do.
Production checklist
- Set
GOMEMLIMITin containers. A value 10–15% below the cgroup limit is a reasonable starting point. - Tune
GOGCto the workload. Low-allocation services can often raise it to 200 or 300 with no harm; allocation-heavy services sometimes benefit from lowering it to keep heap size predictable. - Sample
gctrace=1in a canary. Not in production everywhere — the lines pile up — but in a canary instance long enough to know what normal looks like. - Ship pprof allocs profiles to CI. A diff in the top allocation sites between commits will catch regressions before they hit production.
- Use
sync.Poolfor fixed-size hot allocations. JSON decoder buffers, byte slices for hashing, scratch space for serialisation. - Watch for mark-assist time. In an execution trace
(
go tool trace), large MARK ASSIST bands mean the pacer is behind and goroutines are paying for it directly.
Further reading
- Knyszek — The Go Memory Manager — talks from the runtime team on how the GC and allocator fit together.
- A guide to the Go garbage collector — the official guide; the single best place to start for tuning.
- runtime/mgc.go — the source comments are a careful walkthrough of phases, pacing, and barriers.
- Hudson — Getting to Go (ISMM 2018) — the talk and paper behind the 1.5 / 1.8 collector design.
- Garbage collection — generic explainer — mark-and-sweep, generational, copying, reference counting, side-by-side.
- Escape analysis — the compiler pass that decides what gets to the heap in the first place.