CPU Scheduler Simulator: four ways to share one core.

A CPU scheduler decides which ready job runs next on a core and for how long. Here five jobs share one CPU under four policies. FCFS runs them in arrival order to completion. RR rotates them in slices. Priority picks the most important one and starves the rest. CFS tracks how much CPU each one has used and always picks the one that's used least. Same workload, different stories.

tick

avg wait

0.0

done

0/5

algorithm: quantum: 2

Jobs

id	arrival	burst	rem	prio	nice	done
A	0	6	6	3	0	—
B	1	3	3	1	-5	—
C	2	8	8	4	5	—
D	3	4	4	2	0	—
E	6	5	5	5	10	—

Gantt — CPU timeline

— click Run to start —

RR ready queue (head → tail)

— empty —

Try this

Run FCFS. Watch B (a 3-tick job) wait 5 ticks behind A. The classic convoy effect.
Switch to RR with quantum=2. Now B finishes early and average wait drops sharply — but the timeline gets noisy.
Switch to Priority. Jobs B and D (prio 1 and 2) finish fast; E (prio 5) waits forever if you add more high-prio arrivals. That's starvation.
Switch to CFS. B (nice -5) gets weighted slightly more CPU than the rest. No starvation; the gantt rotates by vruntime.
Slide RR quantum to 1. The chart becomes a checkerboard — fairness rises, context-switch overhead would dominate in real life.

What CFS actually does

Linux's scheduler since 2.6.23 (2007) is built on one rule: each task has a vruntime that ticks up while it runs, scaled by the inverse of its weight (higher nice = lighter weight = vruntime climbs faster, so it picks slower). The scheduler picks the task with the smallest vruntime from a red-black tree. That's it. Fairness emerges; starvation is impossible by construction.

Adjacent

What you're looking at

The jobs table is the workload — five tasks (A–E) with fixed arrival times and burst lengths, plus a priority and a nice value the relevant algorithms read. The Gantt strip below it is the single CPU's history: one coloured cell per tick, showing which job held the processor at that moment. The meta row tracks the running average wait, and under Round Robin you also get the live ready queue with the current slice counter.

Run FCFS first and watch the convoy effect: short job B sits idle for five ticks behind the long job A, dragging the average wait up. Switch to RR and the strip turns into stripes as jobs trade the CPU every quantum, and B finishes far sooner. The one to sit with is Priority versus CFS on the same five jobs. Priority lets the low-priority job E wait while higher ranks keep cutting in; CFS, picking whichever job has accumulated the least weighted runtime, never lets anyone starve. Same workload, opposite outcomes for the unlucky job.

Throughput, latency, fairness — pick two

Every scheduler in history is a different point on this triangle.

A scheduler exists because you have more runnable processes than CPUs. The question it answers — "which one next, and for how long?" — has no universally right answer. The trade is between three goals that conflict in pairs. Throughput wants long uninterrupted runs that keep caches hot and the pipeline full. Latency wants short bursts so interactive tasks respond quickly. Fairness wants no task ignored for too long. Optimise for any one and you lose the other two.

FCFS optimises throughput at the cost of everything else — once a long job is running, short jobs queue behind it (the "convoy effect"). Round Robin optimises latency by preempting every quantum ticks, but pays in context-switch overhead and loses cache locality. Strict priority guarantees response time for important tasks and lets everything else starve. CFS picks a fourth corner: weighted fairness, with throughput recovered by tunable batch parameters and latency capped by a minimum sched granularity. Every real-world scheduler — Windows NT, FreeBSD ULE, macOS XNU, mainframe lottery schedulers — is a variant on the same dial-tuning exercise.

vruntime is the only number that matters

How Linux replaced fifteen heuristics with one tree.

Before CFS, Linux had the O(1) scheduler — separate active and expired arrays, dynamic priority adjustment for interactivity, sleep-bonus heuristics, and a fistful of magic constants tuned empirically. Ingo Molnár's CFS, merged in 2007, threw that out and replaced it with a single data structure: a red-black tree of runnable tasks keyed by vruntime, the virtual runtime each task has accumulated.

Each tick a task runs, its vruntime advances by delta_exec / weight, where weight is derived from the task's nice value through a lookup table. A nice-0 task has weight 1024; nice -5 has roughly 3,121; nice +5 has roughly 335. A nice-5 task's vruntime climbs three times as fast as a nice-0 task's per tick of real runtime, so the scheduler will pick the nice-0 task three times as often once they're both equally "behind." The leftmost node in the tree is always next to run.

Starvation becomes impossible: every task accumulates vruntime monotonically when it runs, so the longest-waiting task is always the smallest in the tree. Fairness becomes tunable: cgroup cpu.weight scales the effective weight of an entire group, letting you say "this container gets 4× the CPU of that one" with one number. The downside, mostly cosmetic: deciding "is this task interactive?" is harder than under the old heuristic-driven scheduler, so very latency-sensitive workloads sometimes still pick SCHED_FIFO or SCHED_RR (the real-time classes) to bypass CFS entirely.

The bugs the scheduler causes

Three production-debugging patterns whose root cause lives here.

Priority inversion. Low-priority task holds a lock. High-priority task waits on the lock. Medium-priority task preempts the low-priority one and runs forever. The high-priority task is effectively blocked by the medium one. Fixed by priority inheritance (POSIX mutexes can request it) or priority ceilings. Mars Pathfinder famously rebooted itself in 1997 because of this.

Thundering herd at scheduler granularity. 1000 worker threads all sleep on the same condition variable. The producer wakes "all" of them. They all wake, all race for the lock, all but one lose, all go back to sleep. CPU spends most of its time on the wake-and-sleep round trip. Fix: use pthread_cond_signal (wake one) or build your own queue with explicit dispatch.

Noisy neighbour in containers. Two containers on the same node, both using the default CFS class. One container is CPU-bound and well-behaved; the other goes into an emergency state and spawns 10,000 threads. The CFS weights still favour each cgroup equally, but the bad container's 10,000 threads collectively use far more CPU than the good container's 4 threads do. Fix: cap with cpu.max (hard ceiling) not just cpu.weight (relative share).

Found this useful?

id	arrival	burst	rem	prio	nice	done
A	0	6	6	3	0	—
B	1	3	3	1	-5	—
C	2	8	8	4	5	—
D	3	4	4	2	0	—
E	6	5	5	5	10	—

id	arrival	burst	rem	prio	nice	done
A	0	6	6	3	0	—
B	1	3	3	1	-5	—
C	2	8	8	4	5	—
D	3	4	4	2	0	—
E	6	5	5	5	10	—

id	arrival	burst	rem	prio	nice	done
A	0	6	6	3	0	—
B	1	3	3	1	-5	—
C	2	8	8	4	5	—
D	3	4	4	2	0	—
E	6	5	5	5	10	—