How do I size a thread pool?

For purely CPU-bound work, size the pool to NumCPU (or NumCPU + 1). For purely I/O-bound work, size much higher — a common formula is NumCPU × (1 + waitTime/computeTime). For mixed workloads, use Little's Law (concurrency = throughput × latency). Production guidance: start small, monitor queue depth and CPU utilisation, scale up when the queue grows or CPU stays under-utilised.

New to this? · ELI5 · 1 min Read Process vs thread explained simply, in plain English

8 min read · Guide · Concurrency

How it works · Concurrency

Thread pools, the boring workhorse.

A queue. Some workers. A dispatcher. The pattern that runs most servers — once you've sized the pool well, you barely need to look at it again.

Parts01–07 InteractiveLive pool PrereqThreads · queues

What is a thread pool?

N workers, one queue.

A thread pool is a fixed set of worker threads that pull tasks off a shared queue and run them. The pattern dates to the 1990s; modern implementations include Java's ExecutorService, Go's goroutine scheduler, Rust's Tokio runtime, and the kernel's workqueue subsystem. Sizing the pool is the dominant lever: CPU-bound work sizes near NumCPU, I/O-bound work sizes much higher.

A thread pool decouples how work arrives from how work runs. Producers push tasks onto a queue; a fixed number of long-lived worker threads loop, dequeue, run, repeat. No thread is created or destroyed per task — that cost (typically ~100 μs to spawn, plus stack allocation) is paid once, at startup.

The trick is bounding. With unbounded workers your server keeps making more threads under load until it OOMs or thrashes. With unbounded queues you reply slowly and never push back. Both bounds are mandatory if you want a server that fails predictably — the same shape as a ring buffer with backpressure.

How a thread pool works, watched live

Workers and a queue, live.

Submit jobs of varying size. The dispatcher hands them to free workers; busy workers count down. Notice the queue grows when arrivals exceed pool capacity — that is back-pressure forming, in real time. (See rate limiting for the upstream version.)

Pool size

Queued

Completed

Sim time

0.0s

Queue · 0

— idle —

Workers

W0 idle

W1 idle

W2 idle

W3 idle

Sizing a thread pool: CPU-bound vs I/O-bound work

CPU-bound vs I/O-bound.

The size formula depends on what the workers do. For CPU-bound work (compression, parsing, math) you want N ≈ number of cores. More than that and threads time-share, adding context-switch overhead without finishing more work.

For I/O-bound work (database, HTTP, disk) the worker spends most of its time waiting. You want N to be much higher — Brian Goetz's classic formula is N = cores × (1 + wait/compute). A worker that waits 90 ms per 10 ms of compute can have 10× cores in the pool before saturating.

# Quick rule of thumb (Brian Goetz, "Java Concurrency in Practice")
N = cores × U × (1 + W/C)
 cores = available CPU cores
 U = target utilization (0..1)
 W/C = ratio of wait time to compute time per task

# Example: 8-core box, 80% util, tasks wait 50ms, compute 10ms
N = 8 × 0.8 × (1 + 50/10) = 38.4 → pool of 38–40 threads

What to do when the queue fills up

When the queue is full, choose deliberately.

An unbounded queue is the most common production landmine. It quietly absorbs load — until you OOM, or your tail latency stretches into hours because the queue accumulated minutes of work. Bound the queue, and pick a rejection policy.

Abort

Throw and tell.

RejectedExecutionException. The producer learns about the back-pressure synchronously. Most servers should default to this and translate to a 429 / 503 — pair with a retry strategy at the caller.

Caller-runs

Push back to producer.

If the queue is full, the calling thread runs the task itself. Slows arrivals naturally — elegant for asymmetric producer/consumer.

Discard

Drop quietly.

Drop oldest or drop newest. Useful for telemetry where missed updates beat backed-up queues. Always log the drop count.

Work stealing: each worker keeps its own queue

Per-worker queues, help neighbours.

A single shared queue creates contention as N grows: every dispatch touches the same lock or atomic. Work-stealing pools (Java's ForkJoin, Go's runtime, Rust's Tokio multi-thread) give each worker its own deque. Workers push and pop their own end (no contention); when one runs dry, it steals from a busy neighbour's tail.

The shape works because most tasks come from local recursion (a worker submitting a child of its own work). The shared-queue case becomes the rare path. ForkJoinPool, Go's scheduler, and Rust's Tokio all use this trick to scale to dozens of cores without lock contention dominating.

One thread per request scales to thousands, not millions

Scales to thousands, not millions.

The classic Tomcat-style server gives each request a thread, blocking on I/O. Fine up to a few thousand concurrent requests; beyond that the per-thread stack (1 MB default) and context-switching cost dominate. To go higher you either move to async I/O (one thread serves many connections via an event loop on epoll/kqueue) or to user-space threads.

Java 21's virtual threads, Go's goroutines, and Kotlin's coroutines all sit between these worlds: code looks blocking, but the runtime parks the lightweight task while a small pool of OS threads carries the work. Same dispatch model — different unit of work.

Three thread pool metrics to export

Three metrics you must export.

Active workers tells you how saturated the pool is. Steady-state at N is bad — there is no headroom for spikes. Queue depth tells you whether you are absorbing or shedding. Above zero for any sustained interval is back-pressure forming. Rejection rate is the canary; if your rejection policy is "abort", every spike pages someone.

Add p99 latency for the dispatch step itself (queue time, not just task time) and you have the full picture. The Little's Law identity — concurrency = arrival rate × latency — falls out of these metrics directly.

Thread pools across Java, .NET, Rust, Go, and Python

The same idea, five different default sizes.

Java ExecutorService: Doug Lea's ThreadPoolExecutor (j.u.concurrent, 2004) is the canonical reference. Configurable core/max threads, work queue, rejection policy. Executors.newFixedThreadPool(N) ships with an unbounded queue — a famously dangerous default that masks back-pressure until the JVM OOMs. Production: configure explicitly with new ThreadPoolExecutor and a bounded queue.
Java ForkJoinPool / virtual threads: ForkJoinPool (Java 7) does work-stealing. Java 21's virtual threads (Project Loom) make a thread-per-request model viable again — millions of cheap virtual threads on a small carrier-thread pool, removing the historical reason to size a thread pool tightly.
.NET ThreadPool: Auto-scales from Environment.ProcessorCount up to a high ceiling (~32k by default). The CLR injects threads at ~500ms intervals when the queue is non-empty — the famous "ThreadPool starvation" stall. Modern .NET prefers Task.Run + async to manual pool sizing.
Rust Tokio runtime: Multi-threaded by default, work-stealing across N worker threads (defaults to NumCPU). Each worker has a local queue (256 entries) plus a shared injection queue. The numbers in Tokio's blog posts (2020-2024) are the benchmark anyone designing a worker pool should read.
Go runtime (GMP): Not a thread pool exactly — goroutines are user-mode tasks scheduled onto OS threads via the GMP model (see the goroutine scheduler simulator). Effectively a thread pool of GOMAXPROCS workers, each draining a per-P run queue with work stealing.
Python concurrent.futures: ThreadPoolExecutor(max_workers=N). Defaults to min(32, os.cpu_count() + 4). The GIL means CPU-bound code doesn't benefit from threads — for that, use ProcessPoolExecutor instead.

The four ways a thread pool breaks in production

Real outages, real lessons.

Unbounded queue → OOM (the most common). Hundreds of public Java services have died because Executors.newFixedThreadPool(N) defaults to an unbounded LinkedBlockingQueue. Producer rate exceeds consumer rate, queue grows unbounded, JVM OOM-kills, restart, repeat. Postmortems from Google's old SRE book and Spotify's early years all cite versions of this.

Pool starvation by recursive submit. A pool of N threads where a task submits another task to the same pool and waits for its result. With N concurrent root tasks, all N threads are blocked waiting for sub-tasks that can never run. Solution: use a separate pool, or use ForkJoinPool which understands this case.

Slow downstream dragging the whole pool. One downstream service goes slow; threads in the pool block on downstream calls; queue depth grows; new requests get rejected even though the local CPU is idle. Solution: per-downstream bulkhead pools (Hystrix's signature pattern), or async I/O with a small thread pool.

Thread leak on rejected exceptions. Tasks that throw an exception don't always release pool resources cleanly — particularly when the rejection-handler itself blocks. Production Java stacks have shipped CVEs for exactly this. Always wrap submitted tasks in a try/finally that releases any held resource.

The unifying theme: thread pools fail by silently growing or silently blocking, not by loud explosions. The three metrics from Part 07 (active workers, queue depth, rejection rate) are exactly the early-warning signals for these four failure modes.

A closing note

Thread pools are dull, which is why most servers run on them. The interesting work is the sizing, the queue bound, and the rejection policy — three numbers that decide whether your server fails gracefully under load or melts into goo.