12 / 20 · Day 5
Day 5 · Concept 12

Goroutines

A goroutine is a function running concurrently. Starting cost: a 2KB stack and one keyword (go). The runtime scheduler multiplexes thousands of them onto a handful of OS threads. You can spawn 100,000 goroutines without breaking a sweat — that's the headline feature.


1 · The intuition

An OS thread costs ~2MB of stack and a few microseconds to start. You can't have 100,000 threads on most machines. A goroutine starts at 2KB and grows on demand (the runtime allocates more pages as needed). The scheduler multiplexes them onto a small pool of OS threads — by default, one per CPU core.

The 2KB number. Go 1.4+ starts every goroutine with a 2KB stack. When a function needs more, the runtime allocates a new 2× larger stack, copies the current frame, and updates pointers. Stack growth is invisible to your code.

2 · Try it — the bare minimum

go main.go · go keyword
package main

import (
    "fmt"
    "time"
)

func say(name string) {
    for i := 0; i < 3; i++ {
        fmt.Println(name, i)
        time.Sleep(100 * time.Millisecond)
    }
}

func main() {
    go say("alice")
    go say("bob")
    say("carol")
}
// Output interleaves all three — exact order varies.

Three goroutines run concurrently. Note: when main returns, the entire program ends — even if other goroutines are still working. Use synchronization primitives (next sections) to wait for them.

3 · Wait for completion — sync.WaitGroup

go main.go · WaitGroup
package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup

    for i := 0; i < 5; i++ {
        wg.Add(1)             // increment counter
        go func(n int) {
            defer wg.Done()    // decrement on exit
            fmt.Printf("worker %d
", n)
        }(i)                   // pass i — don't capture the loop var
    }

    wg.Wait()                  // block until counter is 0
    fmt.Println("all done")
}
The loop-variable trap. Go 1.22 fixed the classic for-loop closure trap by making each iteration's variable fresh. Pre-1.22 codebases pass the loop variable explicitly (as an argument) — still good practice for clarity.

4 · The scheduler — what the runtime is doing

The Go scheduler is the M:P:G model. M = OS thread. P = processor (a logical scheduler), one per CPU core by default. G = goroutine. Each P has a local run queue of Gs; when a P empties its queue, it work-steals from a busier P.

Goroutines yield at function calls, channel sends/receives, system calls, GC synchronisation points, and (since 1.14) preemptively when they run too long. You don't think about any of this — but it's why Go scales.

See it. runtime.NumGoroutine() reports the current count. GOMAXPROCS=4 go run main.go limits to 4 processors. The scheduler internals live in runtime/proc.go — ~6000 LOC, worth a skim once you're comfortable.

5 · The cost of one goroutine

go main.go · spawn 100k
package main

import (
    "fmt"
    "runtime"
    "sync"
    "time"
)

func main() {
    start := time.Now()
    var wg sync.WaitGroup
    n := 100_000

    for i := 0; i < n; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            time.Sleep(10 * time.Millisecond)
        }()
    }

    peak := runtime.NumGoroutine()
    wg.Wait()
    fmt.Printf("spawned %d goroutines
", n)
    fmt.Printf("peak alive: %d
", peak)
    fmt.Printf("total time: %v
", time.Since(start))
}

100,000 goroutines, 50ms total wall time. The same in OS threads would either OOM your machine or take seconds. This is the lever that lets Go HTTP servers handle 100k concurrent connections per box.

6 · The patterns you'll write

go patterns.go · fan-out / fan-in
// Fan-out: distribute work across N goroutines
func fanOut(jobs []Job, workers int) {
    var wg sync.WaitGroup
    in := make(chan Job, workers)

    // Start workers
    for w := 0; w < workers; w++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for job := range in {
                job.Process()
            }
        }()
    }

    // Feed
    for _, j := range jobs {
        in <- j
    }
    close(in)

    wg.Wait()
}

// Bounded worker pool (preferred over "go for every request")
type Pool struct {
    sem chan struct{}
}
func NewPool(size int) *Pool { return &Pool{sem: make(chan struct{}, size)} }
func (p *Pool) Run(f func()) {
    p.sem <- struct{}{}      // acquire
    go func() {
        defer func() { <-p.sem }()  // release
        f()
    }()
}

7 · From the wild

go net/http · server.go (paraphrased)
func (srv *Server) Serve(l net.Listener) error {
    // The main accept loop — one goroutine here.
    for {
        rw, err := l.Accept()
        if err != nil { return err }

        // Each connection becomes its own goroutine. Cheap.
        c := srv.newConn(rw)
        go c.serve(srv.connCtx())
    }
}

// 100k concurrent HTTP connections = 100k goroutines.
// Without goroutines, you'd need a thread pool + epoll-style event loop.
// With goroutines, the scheduler turns blocking I/O into "just yield to another G".
From the wild: go standard library · BSD-3-Clause

8 · Coming from another language?

If you know…The bridge
Pythonasyncio.create_task but without async/await — every function is "async" implicitly. No GIL — true parallelism by default.
JavaScript / Node≈ a promise that doesn't block but isn't single-threaded either. No event loop — the scheduler is.
JavaThread in cost (tiny), like virtual threads (Project Loom). Earlier than Loom by a decade.
Rusttokio::spawn but no async/await. The Go runtime is the executor; in Rust you choose one.
ErlangThe closest match. Lightweight processes, scheduled by a runtime, communicate via messages (channels).

9 · Common mistakes

  • Goroutine leaks. A goroutine blocked on a channel that's never sent to never exits. Always have a cancellation path: context.Context, a closed signal channel, or a timeout.
  • Capturing the loop variable. Pre-1.22, for i := range xs { go func() { use(i) }() } sees the final i in every goroutine. Always pass as argument or shadow with i := i.
  • Sharing variables without synchronization. Two goroutines writing to the same variable is a race. Run with -race to catch.
  • Unbounded goroutine spawning. 10M goroutines does fit in RAM... but the scheduler thrashes. Use a worker pool with bounded concurrency.
  • Assuming order. Goroutine output interleaves arbitrarily. If you need order, serialize through a channel or a mutex.

10 · Exercises (~15 min)

  1. Race detector. Write two goroutines writing to the same int. Run with go run -race main.go. Watch it fire.
  2. Goroutine count. Spawn 10000 goroutines that sleep for 1 second. Use runtime.NumGoroutine() before, during, and after. What does the peak look like?
  3. Bounded pool. Adapt the pattern from section 6 to limit a workload of 1000 tasks to 10 concurrent. Measure: how does total time compare to "spawn all 1000 at once"?
  4. Leak it. Start a goroutine that <-ch on a channel that's never closed. Print runtime.NumGoroutine() after main's sleep. The leak is visible.

11 · When it clicks

  • You spawn a goroutine for any concurrent work without thinking about cost.
  • You instinctively pair every go func() with a "how does this exit" plan.
  • You reach for a bounded pool over unbounded spawning.
  • You run go test -race as part of every test pass.
Found this useful?