Pods Are Cattle: How Kubernetes Self-Healing Actually Works

April 9, 2026


There is a phrase that gets thrown around: "Kubernetes restarts your pod when it crashes." That phrase is wrong in a way that matters. Kubernetes does not restart pods. It reconciles a desired replica count by creating new pods. The crashed pod is gone forever. The replacement is a different pod with a different name, a different IP, and possibly a different node.

The mechanism is a control loop, not a recovery routine. You write a Deployment that says "I want three replicas of this image." That Deployment owns a ReplicaSet, which owns the Pods. The ReplicaSet controller wakes up on every relevant event, counts the running pods that match its selector, and compares against the desired count. If actual is less than desired, it creates pods. If actual is more, it deletes pods. It does this forever. There is no special "a pod just died" code path. There is just a loop that closes a gap.

When a pod dies, the loop fires. Within seconds, a new pod object appears in the API. The scheduler binds it to a node based on current cluster state, which may or may not be the original node. The kubelet pulls the image (often cached, sometimes not), starts the container, and waits for readiness. The Service's Endpoints object updates once the new pod passes its readiness probe. Traffic flows again.

This is why the "cattle, not pets" line keeps showing up in Kubernetes talks. The system was designed around the assumption that pods are fungible. Anything that breaks fungibility breaks Kubernetes' core abstraction.

Here is where it bites in production. A team builds a service that holds user session state in memory. Sticky sessions through the load balancer ensure a user always lands on the same pod. Works fine in staging. Then a deploy rolls the pods. Pods get new names and IPs. The load balancer's sticky cookies still point at the old pod IDs that no longer exist. Every active user gets logged out simultaneously. The team blames the deploy. The actual cause is that they assumed pod identity was stable, and Kubernetes never promised that.

For the cases where identity does matter, Kubernetes has StatefulSet. Pods get stable names (mydb-0, mydb-1, mydb-2), stable persistent volumes that follow the name, and ordered startup. Use StatefulSets for databases, leader-elected systems, anything that needs a known peer list, anything that mounts a specific volume.

The general rule: if your design depends on a specific pod existing, you are fighting the platform. Either move the state out (Redis, Postgres, S3) or move to a StatefulSet. Deployments are for workloads that genuinely do not care which instance you talk to. Most workloads are that. The ones that are not need a different tool, not a different mental model.

Key takeaway

Kubernetes does not heal pods. It reconciles a desired replica count by creating new pods, with new names, new IPs, and possibly new nodes. Any code that assumes pod identity will break the first time a pod dies.

Originally posted on LinkedIn. View original.


All Rights Reserved.