Knowing what a system is actually doing.
Monitoring tells you the things you already knew to watch. Observability is being able to ask a new question of a running system at 3am, without shipping new code first. The whole field comes down to a few signals used well: logs, metrics, and traces, stitched together with distributed tracing so one request can be followed across a dozen services, and pointed at goals you have written down as SLOs. Get those right and most incidents turn from a guessing game into a query.
All four sub-pages are live. Practical mental models for the people who get paged, not a vendor tour.
Start here.
Logs, metrics & traces
The three telemetry signals, what each is actually good at, and the costly mistake of reaching for the wrong one. Why metrics answer "is it broken", traces answer "where", and logs answer "why".
OpenTelemetry & distributed tracing
How a single request is followed across a dozen services. Trace and span context, propagation, the OpenTelemetry model, and why tracing is the one signal that survives a microservice rewrite.
SLOs & error budgets
Turn "be reliable" into a number you can act on. SLIs, SLOs, error budgets, burn-rate alerts, and how a budget changes the conversation between the people who ship and the people who get paged.
eBPF observability
See what a running kernel is doing without changing a line of application code. How eBPF safely runs sandboxed programs in the kernel, and what that unlocks for zero-instrumentation tracing, profiling, and networking.