05 / 10
Internals / 05

Garbage collection

Go ships a concurrent, non-generational, non-moving, tri-color mark-and-sweep collector. It runs alongside the mutator, pauses the world for two short windows per cycle, and decides when to start by watching how fast the heap is growing. Most of the knobs that matter are GOGC, GOMEMLIMIT, and the things you allocate in the first place.


What Go's GC is

Go's collector is concurrent, non-generational, non-moving, and uses tri-color mark-and-sweep. Concurrent means it does most of its work while application goroutines keep running. Non-generational means it walks the entire reachable heap on every cycle — there are no young/old generations. Non-moving means a live object stays at its allocated address; this keeps pointer arithmetic from cgo and unsafe code well-defined, and it removes the need to fix up references after collection.

A short history. Go 1.5 (2015) introduced the concurrent collector and ended the stop-the-world pauses of the early days. Go 1.8 (2017) reworked the write barrier and brought typical STW pauses below 100 µs. Go 1.19 (2022) added GOMEMLIMIT, a soft memory ceiling that gives the pacer something to aim at beyond pure heap-growth heuristics. The shape of the algorithm hasn't changed much since; the work since then has been pacer refinements and allocator improvements.

The tri-color invariant

The collector colours every heap object one of three shades. White means not yet visited — at the end of marking, white objects are garbage. Grey means visited but its outgoing pointers haven't been scanned yet. Black means visited and fully scanned. Marking proceeds by picking a grey object, blackening it, and greying any white objects it points to. The cycle ends when there are no grey objects left.

The strong tri-color invariant is the rule that makes concurrent marking safe: no black object points to a white object. As long as that holds, the collector can sweep white objects without worrying that a black (already-scanned) object still has a live reference into them. The mutator, of course, runs concurrently and can rewrite pointers at any moment — which is exactly the job of the write barrier.

Why this matters. The invariant is the contract between mutator and collector. Everything else — the barrier code, the pacer's deadlines, the STW windows — exists to maintain it cheaply.

Write barriers

During marking, every pointer write goes through a small compiler-inserted hook. Go uses a Dijkstra-style insertion barrier: when the mutator writes a pointer into a heap slot, the runtime greys the new target. That way, even if the slot lives inside an already-black object, the target won't be missed.

The barrier costs around 7–10 ns per pointer write while a cycle is active, and is compiled out when no cycle is running. Since Go 1.8 the collector uses a hybrid barrier — insertion plus a small deletion component — that lets the runtime skip the stack rescan at mark termination. Before 1.8 the rescan was the dominant STW cost; eliminating it is what got pauses under 100 µs.

// Conceptually, every pointer write becomes:
//   *slot = ptr
// is rewritten to:
//   writeBarrier(slot, ptr)
//
// writeBarrier(slot, ptr):
//   if gcphase == _GCmark {
//       shade(ptr)             // <a class="il" href="/simulators/dijkstra-pathfinding">Dijkstra</a> insertion
//       shade(*slot)           // deletion (hybrid, since 1.8)
//   }
//   *slot = ptr

The phases of a GC cycle

A cycle has four phases. Two are stop-the-world but tiny; two run concurrently with your goroutines.

PhaseWhat happensStops the world?
Sweep terminationFinish any leftover sweep from last cycle, enable write barriersYes, tens of µs
Mark (concurrent)Walk the heap from roots, grey-to-black, mutator runs with barrier onNo
Mark terminationDrain remaining work, disable write barriers, prepare for sweepYes, tens of µs
Sweep (concurrent)Reclaim white spans lazily as the allocator asks for memoryNo

Total stop-the-world time is the sum of the two small windows — usually well under 100 µs on a healthy workload. Everything else overlaps with mutator execution. The cost the application pays during marking is the write barrier plus mark assist: a goroutine that allocates faster than the background marker can keep up is conscripted into doing a proportional share of marking work itself.

The pacer

The pacer is the part of the runtime that decides when to start a cycle. Its goal is simple: finish marking before the heap fills up to its target. If it starts too late, allocators have to do more mark assist and pauses can balloon. If it starts too early, you waste CPU collecting a heap that hadn't grown much.

By default the target is set by GOGC, which expresses heap growth as a percentage of the live set after the previous cycle. GOGC=100 (the default) means the next cycle will aim to complete by the time the heap has doubled. Since Go 1.19 the pacer also respects GOMEMLIMIT: a soft total-memory ceiling that pulls cycles forward as the process approaches the limit, trading CPU for staying under the cap.

The pacer uses a feedback loop over recent cycles to estimate marking rate versus allocation rate, then starts the next cycle at a heap size it predicts will let it finish on time. It usually converges quickly. Workloads with bursty allocation are the ones that give it trouble.

GOGC and GOMEMLIMIT

GOGC is the primary knob. Raising it (GOGC=200) lets the heap grow further between cycles — fewer collections, more memory, lower CPU spent on GC. Lowering it (GOGC=50) does the opposite: more frequent collections, smaller heap, more CPU. Setting GOGC=off disables the collector entirely, which is almost never what you want outside of short-lived batch jobs.

GOMEMLIMIT is the knob for containerised environments. It takes a byte value (4GiB, 500MiB) and treats it as a soft total-memory ceiling for the Go runtime — heap, stacks, runtime overhead. The pacer will run cycles more aggressively as the process approaches it. It does not hard-cap memory; the runtime will exceed it rather than crash, but it will spend a lot of CPU doing so.

Worked example. A service in a 4 GB container, with no GOMEMLIMIT, can be OOM-killed even with plenty of headroom — because GOGC=100 will happily let the heap reach 2× the live set, and Go's view of "the heap" doesn't know about the cgroup limit. Setting GOMEMLIMIT=3500MiB gives the pacer a number to respect and usually prevents the kill. Keep some margin below the cgroup limit for non-heap allocations and other processes in the container.

Reading gctrace

Setting GODEBUG=gctrace=1 in the environment prints a one-line summary on stderr at the end of every cycle. The format is dense but worth learning to read.

gc 14 @2.345s 3%: 0.018+1.2+0.024 ms clock, 0.14+0.50/2.3/0.0+0.19 ms cpu, 120->128->64 MB, 130 MB goal, 0 MB stacks, 0 MB globals, 8 P

Reading left to right: gc 14 is the cycle number. @2.345s is seconds since program start. 3% is the share of CPU spent on GC since start. 0.018+1.2+0.024 ms clock is wall time for sweep-termination + mark + mark-termination — the first and third numbers are STW windows; the middle is concurrent. The cpu figures break that down across all P's. 120->128->64 MB is heap size at cycle start, at cycle end, and the live set after sweep. 130 MB goal is what the pacer was aiming for. 8 P is the number of processors.

Numbers that suggest trouble: STW phases consistently above a millisecond (usually means a very large stack or root set), mark assist showing up as a large slash-term in the cpu breakdown (the pacer is behind — allocators are being made to help), live set climbing cycle over cycle (a leak), or cycles happening every few hundred milliseconds (allocation rate too high for the current GOGC).

Common pitfalls

  • Allocations in hot loops. Every make, append that grows, or string concatenation in a tight loop adds GC pressure. sync.Pool is the usual fix for fixed-size buffers — but only for objects with a clear reset story.
  • Interface conversions that escape. Passing a stack-allocated value as an interface{} usually forces it to the heap, because the interface header needs a stable pointer. Escape analysis catches some of these; many slip through.
  • Very large heaps without GOMEMLIMIT. In containers, the runtime has no idea what your memory limit is. OOM kills with idle GC headroom are the result. Set GOMEMLIMIT.
  • Depending on finalizers. runtime.SetFinalizer is not guaranteed to run, may run late, and won't run at all on program exit. Treat finalizers as a debugging aid, not a resource-management strategy.
  • Assuming runtime.GC() is a fix. An explicit runtime.GC() call is useful for benchmarks and tests where you want a known starting state. In production it just adds a forced cycle on top of the ones the pacer was already going to do.

Production checklist

  • Set GOMEMLIMIT in containers. A value 10–15% below the cgroup limit is a reasonable starting point.
  • Tune GOGC to the workload. Low-allocation services can often raise it to 200 or 300 with no harm; allocation-heavy services sometimes benefit from lowering it to keep heap size predictable.
  • Sample gctrace=1 in a canary. Not in production everywhere — the lines pile up — but in a canary instance long enough to know what normal looks like.
  • Ship pprof allocs profiles to CI. A diff in the top allocation sites between commits will catch regressions before they hit production.
  • Use sync.Pool for fixed-size hot allocations. JSON decoder buffers, byte slices for hashing, scratch space for serialisation.
  • Watch for mark-assist time. In an execution trace (go tool trace), large MARK ASSIST bands mean the pacer is behind and goroutines are paying for it directly.

Further reading

Found this useful?