The runtime package
The runtime package is the standard-library bridge to the scheduler and the heap.
It exposes a small, carefully-curated subset of internals: a few knobs you can turn, a few
counters you can read, and a handful of hooks you probably shouldn't touch. Knowing which is
which is most of the value.
What runtime is
Most of what the Go runtime does — scheduling, garbage collection, stack growth, memory
allocation — happens behind the language. The runtime package is the small
doorway the standard library leaves open. Through it you can read a handful of counters,
turn a few knobs, and register a few callbacks.
The API divides roughly into three groups. Diagnostic:
ReadMemStats, Stack, NumGoroutine,
NumCgoCall — read-only views into runtime state. Operational:
GOMAXPROCS, GC, SetCPUProfileRate — actually change
runtime behaviour. Dangerous: SetFinalizer,
KeepAlive, Goexit, LockOSThread — correct uses exist
but most code that reaches for them shouldn't.
runtime has an obvious
equivalent elsewhere — defer file.Close() instead of SetFinalizer,
a metric library instead of a hand-rolled ReadMemStats loop — prefer the
equivalent. The runtime package is a fallback, not a default.GOMAXPROCS
GOMAXPROCS controls the number of P structures the scheduler
creates, which caps how many goroutines can be executing user code in parallel. Since Go 1.5
the default is runtime.NumCPU(). You can override it at startup with the
GOMAXPROCS environment variable, or change it at runtime by calling
runtime.GOMAXPROCS(n).
The catch in containers: NumCPU reads the host's CPU count, not the cgroup CPU
quota. A process running with a 2-core quota on a 64-core host will spin up 64 P's, oversubscribe
the cgroup, and pay the latency tax of constant throttling. Until the runtime grows cgroup
awareness natively, the standard fix is uber-go/automaxprocs:
a blank import that reads the cgroup CPU limit and sets GOMAXPROCS accordingly.
import _ "go.uber.org/automaxprocs"
// or set explicitly from a container env var:
// GOMAXPROCS=$(nproc) or set from the orchestrator
//
// or programmatically from the cgroup:
// runtime.GOMAXPROCS(cgroupCPUQuota())runtime.NumGoroutine
NumGoroutine returns the number of goroutines that currently exist. It's a
single atomic load — cheap, safe to call in a hot path, no scheduler interaction. Most
metric exporters publish it as a gauge.
The useful diagnostic pattern is leak detection by delta. Snapshot the count when the process reaches steady state, run a workload that should fully drain, snapshot again. If the count climbs and never falls back, you have a leak — every iteration spawned a goroutine that never exited. A goroutine pprof profile then gives you the parked stacks; the ones with the highest counts are usually the culprits.
before := runtime.NumGoroutine()
runWorkload()
waitForDrain()
after := runtime.NumGoroutine()
if after-before > slack {
// leak suspected — dump pprof goroutine profile
}runtime.Stack and stack traces
runtime.Stack(buf, all) writes a formatted stack trace into buf.
With all = false, it dumps the calling goroutine. With all = true,
it walks every live goroutine and writes them all — the same output you get from a SIGQUIT
(kill -3) and from an unrecovered panic with GOTRACEBACK=all.
The all-goroutines variant is heavy. The runtime briefly stops the world, walks every goroutine's stack, formats each frame. On a process with tens of thousands of goroutines this is hundreds of milliseconds of pause. Useful for crash dumps and one-off investigations; not something to wire into a /healthz endpoint.
Stack returns less than its length —
start at 64 KB and grow. Or use runtime/pprof.Lookup("goroutine").WriteTo(w, 2),
which handles the buffering for you and gives the same format.MemStats and gctrace
runtime.ReadMemStats(&m) fills a MemStats struct with about
thirty allocator and GC counters. The fields that matter most:
| Field | Meaning |
|---|---|
HeapAlloc | bytes of live heap objects right now |
HeapSys | bytes obtained from the OS for the heap |
HeapInuse | bytes in spans currently in use |
HeapIdle | bytes in idle spans, returnable to OS |
NextGC | target HeapAlloc for next GC |
NumGC | cumulative count of completed GC cycles |
PauseNs | circular buffer of recent stop-the-world pauses |
GODEBUG=gctrace=1 is the human-readable version: one line per GC cycle on
stderr, with wall-clock duration, CPU fraction, and heap sizes before and after. Useful
during local development; in production, prefer scraping MemStats into your
metric system.
pprof
runtime/pprof exposes the standard profilers: CPU, heap, goroutine, block,
mutex, threadcreate, and a few others. net/http/pprof wraps them as HTTP
handlers under /debug/pprof. Importing the latter as a blank import is enough
to register the routes on http.DefaultServeMux:
import _ "net/http/pprof"
// /debug/pprof/profile?seconds=30 — CPU profile
// /debug/pprof/heap — heap snapshot
// /debug/pprof/goroutine?debug=2 — all goroutine stacks
// /debug/pprof/block — blocking profile (needs SetBlockProfileRate)
// /debug/pprof/mutex — mutex contention (needs SetMutexProfileFraction)In production these endpoints should sit behind auth or on a separate admin port — they
leak source paths and can be expensive to serve. The analyser is go tool pprof,
which can read either a file or a URL directly, and renders flame graphs in a browser via
-http=:8080.
runtime/trace
runtime/trace.Start(w) turns on the execution tracer. Unlike pprof, which
samples, the tracer captures every goroutine state transition, every GC phase, every
scheduler event, every network poll, every syscall — a complete timeline. The format is
binary; read it with go tool trace traceout.bin, which opens a browser view
of the goroutine timeline, syscall and GC bands, and per-goroutine flame graphs.
The price is volume. A few minutes of trace from a busy server can produce hundreds of megabytes. Tracing also adds noticeable overhead — usually under 10% CPU but enough to matter under contention. Use it for one-off investigations of latency anomalies and scheduler weirdness, not as a continuous profile.
f, _ := os.Create("trace.out")
defer f.Close()
trace.Start(f)
defer trace.Stop()
// run the workload under investigation
// then: go tool trace trace.outSetFinalizer
runtime.SetFinalizer(obj, fn) arranges for fn(obj) to be called
at some point after the GC determines obj is unreachable. The intended use is
cleanup of non-Go resources — file descriptors, C handles, mmap'd regions — when explicit
Close can't be guaranteed.
The pitfalls are unusual enough that most code should avoid finalizers entirely:
- Not guaranteed to run. On normal exit, pending finalizers are skipped. If the program crashes or is killed, none run. Don't depend on a finalizer for correctness.
- Runs on an arbitrary goroutine. The finalizer goroutine has no particular context, no caller, no recovery. A panic in a finalizer crashes the program.
- Resurrects the object briefly. The finalizer holds a reference, so the object survives one extra GC cycle. Cyclic finalizers (objects referencing each other) are never collected at all.
Close() method invoked by
defer, plus a check in tests that callers actually call it. The finalizer is
a safety net for things like os.File in the standard library, not a primary
lifecycle mechanism.Common pitfalls and operational gotchas
- Calling
runtime.GC()in production. It forces a full GC synchronously. Almost always wrong — the runtime's pacer is better at choosing when. The one legitimate use is right beforeReadMemStatsin benchmarks, to get deterministic numbers. - Reading
MemStatstoo often. The read itself is cheap, but the runtime updates some counters lazily at GC boundaries, and a very tight polling loop can subtly affect GC pacing. Once per second is plenty for metrics. - Assuming
GOTRACEBACK=allin production. The default issingle: on crash, only the panicking goroutine's stack is printed. For postmortem debugging you almost always wantallorsystem, set via env var on the container. - Forgetting that trace files are huge. Leaving
trace.Starton for an hour on a busy service will fill the disk. Cap the duration and rotate. - Exposing
/debug/pprofwithout auth. The blank import registers handlers onhttp.DefaultServeMux, which is often the same mux serving public traffic. Bind pprof to an admin port or wrap in middleware.
Production checklist
| Setting | Recommendation |
|---|---|
GOMAXPROCS | Set from cgroup quota — automaxprocs or env from orchestrator |
/debug/pprof | Exposed on a separate admin port, or behind auth middleware |
| Goroutine count + MemStats | Scraped periodically as metrics (per minute is fine) |
runtime/trace | One-off investigations only; bounded duration; rotate files |
GOTRACEBACK | all in production for crash diagnostics |
| Cleanup | Explicit Close() via defer; finalizers as a last resort |
GOMEMLIMIT | Set to ~90% of container memory to cap heap growth |
Most of these are one-line decisions made at service-template time. Getting them right once means every service inherits sane runtime behaviour without anyone thinking about it.
Further reading
- pkg.go.dev — runtime — the official API surface, with the small set of guarantees each function provides.
- pkg.go.dev — runtime/pprof and net/http/pprof — the profiler API and the HTTP wrapper.
- uber-go/automaxprocs
— blank-import package that sets
GOMAXPROCSfrom the cgroup quota. - Go blog — Profiling Go programs
— the original walkthrough of
go tool pprof; still the clearest intro. - Go blog — Execution traces in 2024 — the modern tracer, with notes on the lower-overhead implementation introduced in 1.21.
- runtime/mfinal.go — the finalizer implementation, including the resurrection mechanics.