Tool

K8s probe generate.

Compose livenessProbe, readinessProbe, and startupProbe YAML — HTTP, TCP, exec, or gRPC handlers — with the timing knobs that matter (initial delay, period, timeouts, thresholds). Copy a Deployment fragment ready to paste.

Liveness
http
Readiness
http
Startup
off

Presets
http
http
http
Generated YAML
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  template:
    spec:
      containers:
        - name: app
          image: ghcr.io/example/app:latest
          ports:
            - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080

Liveness, readiness, startup.

Kubernetes probes look superficially identical — they all run a check against the container, all have the same timing parameters, all return success or failure. The semantic differences are everything. Liveness probe failures cause the kubelet to restart the container, on the theory that something went wrong and a fresh start might fix it. Readiness probe failures cause the kubelet to remove the pod from any Service it backs, so traffic stops flowing to it; the pod stays running. Startup probe failures cause the kubelet to keep waiting; while the startup probe is failing, liveness and readiness probes are paused.

The startup probe is the newer addition (stable since Kubernetes 1.20, March 2021). Before startup probes, slow-starting applications had to set a high initialDelaySeconds on liveness, which compromised liveness for the rest of the pod's lifetime. The startup probe lets you set a generous timeout for boot — say, ten minutes for a JVM with a large warmup — and then hand off to a fast-cycle liveness probe (every 10 seconds) for the steady state. The startup probe is purely about boot; once it succeeds, it's never run again.

The most common configuration mistake is conflating liveness and readiness. They almost always need different endpoints. Liveness should test whether the process is alive — usually, "is the HTTP server responding at all?" — without checking dependencies. Readiness should test whether the process can serve traffic — including database connectivity, cache warm-up, feature-flag service availability. If liveness checks dependencies, a transient downstream outage will trigger pod restarts, which only makes the outage worse.

HTTP, TCP, exec, gRPC.

Each probe carries one handler. httpGet is the workhorse: the kubelet makes an HTTP GET to the configured path and port, considers anything in the 200–399 range a success, and anything else (including timeouts) a failure. Headers can be customised, and HTTPS is supported (with no certificate verification — the probe is local to the node). Most well-designed services expose a small unauthenticated /healthz endpoint specifically for this.

tcpSocket opens a TCP connection to the port and considers a successful handshake a success. Useful for protocols that don't speak HTTP (Postgres, Redis, MongoDB, JMX) where any HTTP-layer probe would require a sidecar. The semantics are weak — a process that accepts the TCP connection but is otherwise broken still passes — but it's better than nothing.

exec runs a command inside the container; exit code 0 is success, anything else is failure. The most flexible option, useful for legacy applications that have a CLI for health-checking. Cost is real: each exec spawns a process inside the container, which adds CPU and memory pressure on busy nodes. Consider exec when HTTP and TCP can't express the check.

grpc is the newest handler, stable in 1.27 (April 2023). It uses the gRPC Health Checking Protocol — the kubelet calls grpc.health.v1.Health/Check with an optional service name, and considers SERVING a success. Before native gRPC, services had to expose a separate HTTP endpoint or run grpc-health-probe as an exec; native gRPC removes that overhead. If you're writing a new gRPC service today, implement the standard health protocol and use the gRPC handler directly.

Five knobs, tightly coupled.

initialDelaySeconds delays the first probe after the container starts. Default 0 — the first probe runs immediately. For applications with predictable boot time, this is fine; for slow-starting applications, prefer a startup probe over a high initialDelaySeconds.

periodSeconds is the interval between consecutive probes. Default 10. Tightening this gives you faster detection but more CPU spent on probes; loosening it lets failures linger. Reasonable values: 5–15 for HTTP, 10–30 for exec.

timeoutSeconds is how long to wait for a single probe to respond. Default 1, which is often too aggressive — a busy node can take more than a second to round-trip even a fast HTTP request. Set to 3–5 unless you have a specific reason to be tight. Hitting timeout repeatedly during steady state is a smell.

successThreshold is consecutive successes required to flip from failing to passing. Default 1, and must be 1 for liveness and startup probes (you can't require multiple successes to "still be alive"). Useful only on readiness, where you might want to confirm the pod is consistently healthy before sending traffic.

failureThreshold is consecutive failures before the probe is considered failed. Default 3. Multiplied by periodSeconds, this is the detection window: at default settings (period 10, threshold 3), a real failure takes 30 seconds to detect. For startup probes, set failureThreshold high — your boot timeout is periodSeconds × failureThreshold, and you want this to comfortably exceed worst-case startup time.

ProbePeriodTimeoutFailureDetection window
Liveness (web)30s3s3~90s
Liveness (slow exec)60s10s3~3min
Readiness (web)5s2s2~10s
Readiness (gRPC)10s3s3~30s
Startup (JVM warm-up)10s3s6010 min boot allowed

How probes go wrong.

A liveness probe that takes longer than its timeout under normal load is a failure waiting to happen. The kubelet will eventually count three failures in a row, restart the container, and you'll see endless restart loops in the logs. The fix is either to make the endpoint faster (it should always be fast — that's the whole point) or to widen the timeout. Increasing the timeout beyond periodSeconds defeats the purpose; if your probe takes 15 seconds and runs every 10, you have continuous overlapping probes.

A readiness probe that's coupled to a downstream service creates a thundering-herd problem. If your service depends on Redis, and your readiness probe checks Redis connectivity, then any Redis flap removes every pod from the Service endpoints simultaneously — your service goes 100% unhealthy in seconds. The fix is to decouple readiness from external dependencies; use a circuit breaker on the actual application calls instead, so you fail fast on sick downstreams without removing yourself from the Service.

A startup probe that's too lenient hides real boot-time problems. If you set failureThreshold: 100 with periodSeconds: 30, your application has 50 minutes to start before the kubelet gives up. That's almost always too long; bad code that crashes on startup will keep retrying for an hour without alarms. Set the threshold to a realistic worst-case for healthy boot, and let the kubelet escalate via CrashLoopBackOff for actually broken pods.

For long-running, single-replica deployments — databases, queues, anything stateful — be conservative with liveness. The cost of an unnecessary restart is high (state may take minutes to recover). Many production database operators (Postgres-Operator, Patroni, etc.) explicitly recommend disabling liveness probes and relying on operator-level health management. The probe configurations on this page assume stateless workloads; stateful workloads warrant case-by-case review.

Found this useful?