What is the best book to learn operating systems?

Two foundational books. Operating Systems: Three Easy Pieces (Arpaci-Dusseau, free online at OSTEP) is the modern undergraduate text — accessible, well-paced, covers virtualisation/concurrency/persistence in clear chapters. The Linux Programming Interface (Kerrisk, 2010) is the comprehensive Linux syscall reference; ~1,500 pages, 2 GB of insight. Read OSTEP first.

Do I need to know assembly to learn operating systems?

Some, but not deep. Understanding what registers and the stack are, how function calls translate to assembly, and what a syscall instruction does at the CPU level is enough for most OS study. Reading kernel source (Linux is open) helps; writing assembly from scratch usually doesn't. Computer Systems: A Programmer's Perspective (Bryant & O'Hallaron) is the bridge between assembly and OS concepts if you want one.

What is the difference between a process and a thread?

A process owns its own virtual address space — separate page tables, separate file descriptor table, separate signal handlers. A thread shares those with sibling threads in the same process. Switching between processes is expensive (TLB flush, full context save); switching between threads in the same process is cheap. Linux's clone() syscall actually creates both — a process is just clone() with no shared resources, a thread is clone() with everything shared.

Operating System Internals Study Path — Processes, Memory, Filesystems

Why the kernel matters.

Three problems, repeated forever. Sharing: many programs, one set of CPUs, one slab of RAM, one disk. The OS multiplexes both across time and space. Isolation: each program must believe it owns the machine, and one program's bug must not corrupt another. Abstraction: programs should not have to know whether the disk is a platter, an SSD, or a network filesystem. The file descriptor is the unifying layer.

Unix solved all three with a small set of primitives — process, file, fork, exec, the stream of bytes. Fifty-three years later the same primitives run hyperscale clouds. Linux added namespaces and cgroups for stronger isolation; io_uring and eBPF for cheaper abstraction; PREEMPT_RT for harder real-time. The shape stayed.

Read the kernel, you'll think differently about your code. Most of what you can change in user-space looks small once you've seen what the kernel does on every syscall. Profilers stop being mysterious. Latency budgets stop being arbitrary. "the OS is slow" becomes "I am holding the OS wrong".

Twelve mental models.

Twelve concepts cover ~95% of OS surface. Get these in your bones in the first month. Every kernel feature you meet (containers, eBPF, io_uring, KVM) is a recombination of them.

01 Process Day-zero

A program in execution: an address space, one or more threads, file descriptors, identity, and accounting. The OS unit of isolation. fork() creates one; exec() replaces its image; the PID stays.

02 Thread Day-zero

A flow of execution sharing the address space with siblings. Cheap to create (10–50 µs), but raises the contract from "this code runs" to "this code interleaves". 1:1 (Linux) vs M:N (Go, Erlang).

03 Virtual memory Day-zero

Each process sees a private 128-TB address space. The MMU translates page-by-page via four-level page tables, cached in the TLB. Demand-paged: pages aren't real RAM until you touch them.

04 Page table & TLB Practitioner

CR3 → PML4 → PDPT → PD → PT → frame. Four memory accesses per translation, cached in a tiny on-chip TLB. PCIDs let TLB entries survive process switches; huge pages let one entry cover 2 MB.

05 Scheduling Practitioner

Many runnable threads, few CPUs. Linux ran the O(1) scheduler 2003–07, CFS 2007–24, EEVDF since. Each tracks vruntime and balances per-CPU runqueues. Real-time classes (FIFO, RR, DEADLINE) sit above.

06 Syscall boundary Practitioner

The single doorway from ring 3 to ring 0. ~100–1000 ns each on modern x86 via the SYSCALL instruction. The vDSO maps read-only kernel data into user-space so clock_gettime is 50× faster.

07 File descriptor Day-zero

A small integer indexing into the per-process FD table. Files, sockets, pipes, devices, signalfd, eventfd, epollfd — all reach you through one. The most successful abstraction in Unix.

08 Page cache Practitioner

Files live in RAM until the kernel evicts them. read() copies from cache; write() dirties cache pages flushed later. mmap maps cache pages directly. Tuned via vm.dirty_ratio + vm.swappiness.

09 Synchronization Practitioner

Mutexes (futex-backed), condition variables, semaphores, atomics with memory ordering. Race conditions, deadlocks, livelocks, priority inversion — the four hazards. Lock ordering or try-with-backoff.

10 IPC Operator

Pipes, FIFOs, Unix domain sockets (with SCM_RIGHTS for FD passing!), shared memory, signalfd / eventfd, POSIX message queues. Cross-host = TCP / QUIC / NATS / Kafka.

11 epoll & io_uring Operator

epoll: register FDs once, get back only the ready ones — O(active). io_uring: submit hundreds of operations through a shared-memory ring, harvest completions later. The substrate of every modern high-throughput server.

12 Namespaces & cgroups Operator

Namespaces give each container its own view of PIDs, mounts, network, UTS, IPC, users. cgroups v2 enforces CPU, memory, and IO quotas hierarchically. Together: the building blocks of every container runtime.

Day zero — first hour.

One hour. Read OSTEP chapters 4 (the abstraction: process), 13 (address spaces), and 26 (concurrency, an introduction). Then strace a simple program, watch every syscall, and follow one major page fault. The bar is muscle: read the right OSTEP chapters, then watch the kernel react in real time.

# 1. Read OSTEP ch. 4, 13, 26 (≈ 60 minutes)
#    https://pages.cs.wisc.edu/~remzi/OSTEP/

# 2. Pick a tiny C program (or write one)
cat > hello.c <<'EOF'
#include <stdio.h>
#include <unistd.h>
int main(void) { printf("hi from pid %d\n", getpid()); sleep(1); return 0; }
EOF
cc hello.c -o hello

# 3. Watch every syscall
strace -e trace=execve,openat,mmap,brk,write,exit_group ./hello

# 4. Watch page faults on a heavier program
/usr/bin/time -v ./your-actual-program 2>&1 | grep -E 'page faults|context switches'

# 5. Read your own /proc — pick a long-running process (e.g. your shell)
cat /proc/$$/status     # state, threads, RSS, VSZ
cat /proc/$$/maps       # virtual address layout
cat /proc/$$/limits     # rlimits

# 6. Trace a system-wide event for 30s with bpftrace (optional)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_openat
  { printf("%s -> %s\n", comm, str(args->filename)); }'

Done. You have read the right three chapters, watched a real syscall trace, inspected /proc, and (optionally) used eBPF to watch the system in flight. Everything below builds from here.

Week 1 to Month 3 — pick a track.

After the first hour you can read OS writing without bouncing off it. Spend the next three months on one track at a time, depth-first. Don't try to learn schedulers and file systems in the same fortnight. Pick the one that maps to your job and finish it.

The process model

fork, exec, wait, signals, zombies, orphans, setsid. Read Stevens APUE chapters 7–10; trace a shell pipeline with strace. Build a tiny shell as a weekend project — once you have it, fork/exec/dup2/wait stop being abstractions.

→ Reference

Memory & the MMU

Address spaces, page tables, the TLB, demand paging, COW, swap, NUMA. Read OSTEP "Virtualization" part. Inspect /proc/PID/maps, /proc/PID/smaps, /proc/PID/pagemap on a running process. Watch the TLB hit rate with perf stat -e dTLB-loads,dTLB-load-misses.

→ Reference

Schedulers

CFS, EEVDF, real-time classes, cgroup throttling. Read the kernel's sched-design-CFS doc; profile a Kubernetes pod under quota. The classic production pathology is CFS throttling at the cgroup boundary; the fix is rarely "more CPU".

→ Reference

File systems & I/O

Inodes, dentries, the page cache, fsync, journaling vs COW (ext4 vs ZFS/btrfs). Read the LFS paper; mount a few filesystems and compare. fio for benchmarks, strace -e openat for visibility.

→ Reference

Concurrency primitives

Mutexes, condvars, semaphores, atomics, memory ordering, futex. Read Mary Lynn Manns / Erlang the C++20 atomics ref; read Paul McKenney's Is Parallel Programming Hard. Build a SPSC ring buffer — once it's correct, you understand acquire/release.

→ Reference

Networking inside the kernel

sockets, TCP state machine, epoll, io_uring, eBPF, XDP, DPDK. Read Beej's sockets guide as a refresher; then the kernel networking docs and one Cloudflare engineering post on XDP.

→ Reference

Containers & cgroups

Namespaces (PID, mount, net, UTS, IPC, user, cgroup, time); cgroups v2 (CPU, memory, IO). Read Liz Rice's "Containers from Scratch" talk; build your own container with unshare / clone / pivot_root.

→ Reference

Books worth reading.

2018 · free online

Arpaci-Dusseau — Operating Systems: Three Easy Pieces (OSTEP)

The right modern OS textbook. Three parts — virtualization (CPU and memory), concurrency, persistence. Free online; print-friendly PDF; the homework is real. Start here.

2010 · Addison-Wesley

Robert Love — Linux Kernel Development (3rd ed.)

The most readable book on the Linux kernel. Slightly dated (kernel 2.6) but the core abstractions haven't moved. The book that turns "I know syscalls" into "I read kernel source for fun".

2020 · Pearson

Stevens, Rago — Advanced Programming in the UNIX Environment (3rd ed.)

APUE. The C-and-Unix bible. Files, processes, signals, threads, terminals, sockets — at the level of "here is the syscall, here is the man page, here are five real examples". Reach for it whenever a syscall surprises you.

2020 · Addison-Wesley

Brendan Gregg — Systems Performance (2nd ed.)

The methodology book for performance work on Linux. USE method, RED method, flame graphs, off-CPU analysis. The book that turns "the server is slow" into a finite checklist.

2021 · self-published

Paul E. McKenney — Is Parallel Programming Hard, And, If So, What Can You Do About It?

McKenney's perfbook. Free online. The most rigorous treatment of concurrency outside academic textbooks — RCU, hazard pointers, memory ordering, locklessness. Re-read every year.

2009 · O'Reilly

Kerrisk — The Linux Programming Interface

TLPI. Michael Kerrisk — maintainer of the Linux man-pages project — writes the book on Linux syscalls. Encyclopaedic; pairs with APUE for "Linux specifically" detail.

2003 · Wiley

Tanenbaum, Bos — Modern Operating Systems (5th ed.)

The canonical undergraduate textbook. Drier than OSTEP but broader — distributed OS, security, virtualization, mobile OS. Reach for it when OSTEP's cheerful tone wears thin.

Honourable mentions: Understanding the Linux Kernel (Bovet, Cesati — older but still excellent on the VM subsystem); Linux System Programming (Robert Love — the user-space companion to LKD); The Design of the UNIX Operating System (Bach — historical but still beautifully clear on the original architecture).

Courses and references.

Free

Paid (worth it)

Bradfield CS
Computer Architecture & Operating Systems
Cohort-based, instructor-led; pricey. The OS chunk pairs with Bradfield’s networking and DB courses for the full systems education. Watch first; pay if the structure helps you.

Papers worth reading.

Twelve papers, roughly 1965 → 2019. Read them in order. The field is dense and citation- heavy. Most are 10–25 pages.

01
1965 · Dennis & Van Horn
Programming Semantics for Multiprogrammed Computations
The paper that gave us the process abstraction. Read it for the historical frame: "what is a process" was a research question once. Twenty pages.
02
1973 · Lampson
A Note on the Confinement Problem
Lampson's covert-channel paper. The first formal treatment of "what does the OS keep secret, and from whom". The vocabulary of seven covert-channel categories is still the right one in 2026.
03
1974 · Ritchie & Thompson
The UNIX Time-Sharing System
The Bell Labs CACM paper. Twelve pages explaining files, processes, the shell, the directory tree, pipes. Every Unix design decision in the field traces back to this; read it before any other OS paper.
04
1992 · Rosenblum & Ousterhout
The Design and Implementation of a Log-Structured File System
LFS — write the entire disk as a circular log. The architectural ancestor of every modern flash-aware filesystem (F2FS, the FTL inside SSDs themselves). Read it for the disk-as-log mental model.
05
1995 · Bonwick
The Slab Allocator: An Object-Caching Kernel Memory Allocator
Bonwick's slab allocator from Solaris. The Linux SLUB allocator descends directly. Read for the "object pools matter" insight that drives every kernel memory allocator since.
06
1996 · Cao, Felten, Karlin, Li
Implementation and Performance of Application-Controlled File Caching
The classic paper on letting applications hint at the page cache. madvise, posix_fadvise — they all trace here. Read before tuning vm.* anywhere in production.
07
2004 · Bershad et al
Lightweight Recoverable Virtual Memory
The mmap-and-checkpoint pattern. The intellectual ancestor of every modern persistent-memory and copy-on-write database. Worth reading before reaching for kernel byte-addressable persistent memory APIs.
08
2010 · McKenney
Memory Barriers: A Hardware View for Software Hackers
McKenney is the author of RCU. This paper explains memory barriers, store buffers, and cache coherence from the hardware side. Read it once and the C++/Rust acquire/release vocabulary stops being mysterious.
09
2014 · Anderson, Dahlin
Operating Systems: Principles & Practice (the OS:PP textbook)
Not a paper, but the text. Modern OS textbook from Tom Anderson; pairs with OSTEP. Particularly strong on synchronisation and threads. Free PDF chapters online.
10
2017 · Bonifaci, Brandenburg, Stiller, Wieder
Counting on Fast Userspace Mutexes (futex revisited)
A revisit of futex semantics with priority inheritance. If you operate latency-critical code, this is required reading; if you don't, it's a beautiful narrow paper to feel smart about.
11
2019 · Axboe et al
io_uring: An Introduction (kernel docs)
Jens Axboe's introduction to io_uring. Submission and completion rings, polling mode, fixed buffers. The substrate every new high-throughput Linux server in 2020+ is built on.
12
2020 · Gregg
BPF Performance Tools
Brendan Gregg's comprehensive book on eBPF for production observability. opensnoop, execsnoop, tcpconnect, profile — every diagnostic tool reduced to a one-liner. Pair with his earlier Systems Performance book.

Going further: Lampson’s "Hints for Computer System Design"; the Mach paper (Accetta et al, 1986); the seL4 verification paper (Klein et al, 2009); the Singularity OS papers from MSR; Solaris’ DTrace (Cantrill, 2004) — the conceptual ancestor of eBPF.

Talks worth watching.

Hands-on tools.

Theory without something you can run is fragile. Each of these is a manageable way to make the kernel push back when you make a mistake.

Environment	Cost	Best for
xv6	Free, open-source	MIT 6.S081’s teaching Unix. ~10k lines of C. Compile, boot in QEMU, modify the scheduler / shell / file system. The most direct way to feel a kernel respond to your changes.
strace + ltrace	Free	strace traces syscalls; ltrace traces library calls. The tools every Linux engineer reaches for first when "why is this program slow / failing / weird". Read the man page once; you’ll use them forever.
perf	Free, in-tree	The kernel’s native profiler. perf top for live; perf record + perf report for post-hoc; perf stat for hardware counters. The tool that turns "the box is slow" into "this function is the bottleneck".
bpftrace + BCC	Free, open-source	eBPF for the rest of us. Brendan Gregg’s tools: opensnoop, execsnoop, tcpconnect, runqlat. Production-safe, zero-instrumentation observability. Run as root; learn five tools; replace half your debugging.
QEMU + your own kernel	Free	Build a custom kernel from the source tree (make defconfig + make -j$(nproc)); boot it in QEMU. The "I made one change and watched it work" loop is half a day of setup and infinite reps after.
cyclictest + stress-ng	Free, open-source	Latency benchmarks for real-time scheduling. cyclictest measures interrupt latency; stress-ng generates load. The right tools to evaluate PREEMPT_RT, isolated CPUs, or any "is this kernel real-time enough" claim.

Latency, at a glance.

Twelve numbers, calibrated for modern hardware. Print this and tape it next to the monitor. The ones that surprise people most: a syscall costs ~10× a function call, an L2 miss costs ~100× an L1 hit, and a major page fault costs ~10,000× a minor one.

Operation	Latency	Notes
L1 cache reference	~1 ns	Cache hit, no stall.
Branch misprediction	~3 ns	Pipeline flush; small but in tight loops it dominates.
L2 cache reference	~4 ns	Still on-chip.
Mutex lock/unlock (uncontended)	~25 ns	Modern Linux futex fast path.
Main memory reference	~100 ns	Tens of cycles — feels free, isn't.
Empty syscall (getpid)	~100–300 ns	The mode-switch cost; vDSO bypasses it for clock_gettime.
Context switch (same process)	~1 µs	No CR3 change; cheap.
Context switch (cross-process)	~3–5 µs	TLB flush amortised by PCID.
NVMe random read (4 KB)	~10–100 µs	Fast SSDs; same order as a context switch.
Local network round trip	~50 µs – 1 ms	Same datacenter, modern NICs.
Page fault from disk (major)	~ms	Major page fault; visible in /proc/PID/stat.
Cross-region network	~30–300 ms	Architecturally expensive; design around.

Numbers are order-of-magnitude on a 2024-class x86 server. Always measure on your own hardware. Jeff Dean’s "Latency Numbers Every Programmer Should Know" — last updated by colinscott — is the canonical scaffold.

Common mistakes.

Patterns every team writes at least once. Read these now and you'll recognise the shape later, when something on-call is misbehaving and the dashboard is no help.

Forgetting that fork() is COW: A 4 GB process forks; the engineer thinks "we just doubled RAM". No — the child shares pages until it writes. Fork is microseconds. The actual mistake is calling exec() too late, holding lots of dirty pages.
Treating threads as free: A thread is 8 MB of virtual address space, ~16 KB of kernel state, plus context-switch cost. 100k threads on Linux works (it's designed for it), but unbounded thread pools deadlock under load. Cap the pool; queue the rest.
Blocking the runtime's event loop: A synchronous read inside an async runtime (Node, Tokio, Go without proper isolation) can stall the whole event loop. Either move the work to a worker pool or use the runtime's blocking-syscall offload.
Ignoring CFS throttling: Kubernetes CPU limits are enforced by cgroup CFS bandwidth control. A burst hits the quota; the cgroup is frozen for the rest of the period (default 100 ms). Symptoms: tail-latency spikes correlated with the period. Fix: raise the limit, remove it, or use SCHED_DEADLINE.
fsync paranoia (or lack thereof): A successful write() puts bytes in the page cache, not on disk. fsync flushes them. fsync on the parent directory after rename is required for the rename to survive a crash. Most "we lost data after a power cut" bugs trace here.
Misusing /dev/urandom vs getrandom: Old code reads /dev/random and blocks at boot waiting for entropy. Modern code calls getrandom(2) (default flags) — it does the right thing on every kernel since 3.17. Don't ship blocking entropy reads in 2026.
Storing config in env without size limits: execve has a hard limit on argv+envp size (~128 KB on Linux). Container orchestrators happily inject 200 KB of env vars, and your fork/exec starts failing with E2BIG. Cap env-var sizes in the config layer.
Treating mmap as faster I/O: mmap looks like memory; underneath it's page faults that issue disk I/O. For sequential reads of large files, read() with a sane buffer is often faster (no fault-per-page overhead). Profile.
Writing to /tmp without considering tmpfs: /tmp is RAM-backed (tmpfs) on most distros now. A 50 GB write to /tmp can OOM the box. Use /var/tmp or an explicit on-disk path for large temp files.
Reaching for a thread when a process would be safer: A bug in a thread takes down its siblings; a bug in a process takes down only itself. For untrusted code, hot-reload, or "I want a hard isolation boundary", processes are the right tool — even at their higher cost.

Quick test.

Ten cards: the questions interviewers ask, the things that bite operators in production, and the trivia that separates "I run Linux" from "I understand it".

Card 1 of 10

Why does fork() of a 4 GB process cost microseconds, not seconds?

Suggested sequences

Reading progressions

Three ordered paths through this material. Pick the one that matches where you are.

Path 01 · Processes

Processes, threads & scheduling

How the OS multiplexes CPU time between tasks, from process lifecycle to thread pools.

Path 02 · Memory

Virtual memory & allocation

From the MMU to malloc: how memory is virtualised, managed, and eventually collected.

Path 03 · I/O

I/O, storage & networking

How the kernel handles blocking I/O, file systems, and the interface to the network stack.

What's next.

Operating systems reward re-reading. OSTEP, read on day 30 and again on day 300, will give you different things. So will Robert Love’s LKD. So will every Brendan Gregg talk. The field is not large. It is dense, and it has been compounding for fifty years.

Pick one real kernel subsystem and read its source for an afternoon. The Linux scheduler (kernel/sched/), the page-cache (mm/filemap.c), the futex implementation (kernel/futex/) are all open. Pair what you read with the paper that inspired it. Then come back to your own code, your own profiles, your own slow paths. You will rewrite some of them.

→ Distributed-systems study path → API design study path → The Semicolony Handbook → Other study paths

Operating systems,from the ground up.

Why the kernel matters.

Twelve mental models.

Day zero — first hour.

Week 1 to Month 3 — pick a track.

Books worth reading.

Courses and references.

Papers worth reading.

Talks worth watching.

Hands-on tools.

Latency, at a glance.

Common mistakes.

Quick test.

Reading progressions

What's next.

Operating systems,
from the ground up.