29 · 8 steps
Visualize / 29

Service discovery in Kubernetes.

A pod boots up. A second later, requests flow to it. Between those two events: kubelet, API server, etcd, EndpointSlice controller, kube-proxy on every node, and an iptables rewrite. Watch the chain — eight components, eight steps, ~1 second wall time.


step 1 / 8
STEP 1kubelet
pod started · status update
STEP 2API server
receives status · writes etcd
STEP 3etcd
persists pod IP + readiness
STEP 4endpoints controller
watches pods + services
STEP 5EndpointSlice
updated with new pod IP
STEP 6kube-proxy
watches EndpointSlices
STEP 7iptables rules
rewritten to include new pod
STEP 8client request
hits service IP · DNAT to pod
pod-ready → traffic flowing · ~1 second wall time
New pod starts · kubelet reports ready

kubelet on the node sees its newly-scheduled pod pass readiness probes. It updates the pod's status in the API server: status.podIP=10.244.3.7, conditions.Ready=True. The pod is now eligible to receive traffic.

Readiness probe
A health check that gates traffic. Until the probe succeeds, the pod is in the Service's endpoints but marked NotReady — kube-proxy won't route to it.
Endpoint
A single (pod IP, port, ready?) tuple. A Service points at a set of endpoints managed by the EndpointSlice controller.

Why DNS doesn\'t cut it alone

Kubernetes Services COULD be just DNS — round-robin A records for each backend pod IP. But DNS has caching with TTLs you don\'t fully control (clients, resolver, OS). A pod that crashed 10 seconds ago might still get traffic because the cached DNS hasn\'t expired. The iptables/IPVS layer adds a much faster control plane: when a pod becomes unready, every node\'s iptables drops it from rotation within ~1 sec, regardless of any DNS cache. DNS resolves the Service to a stable ClusterIP; the dataplane handles per-request routing.

Service mesh alternative

Istio / Linkerd push service discovery into a sidecar proxy (Envoy) running alongside each pod. The proxy gets endpoint updates over xDS from the control plane and does L7 routing (HTTP/gRPC paths, retries, circuit breaking) per-request. Kube-proxy still exists but is mostly bypassed. The trade-off: more CPU and latency overhead per pod (an extra hop), but much richer routing and observability.

Service discovery outside Kubernetes

Consul, ZooKeeper, Etcd, AWS Cloud Map all do roughly the same thing: a place to register services, a way to discover them. Consul gained popularity with HashiCorp\'s tooling and ships with health checking. Modern AWS uses Cloud Map + ALBs to handle the same problem for ECS/Fargate. The pattern is consistent: "registry + watchers + dataplane" — only the implementations differ.

Go deeper

Service mesh + discovery patterns →

How Istio/Envoy actually route, xDS protocol, sidecar vs ambient mode, DNS-based discovery for non-k8s systems.

Open the Codex →
Found this useful?