Three Kubernetes Probes, Three Different Jobs

May 10, 2026


Kubernetes ships three probes that all look like HTTP health checks at a glance. They are not interchangeable. Each one answers a different question, and the failure modes show up when teams treat them as one thing.

The startup probe runs first. It exists for slow-booting applications: large JVMs, services that warm a cache from disk, frameworks that take 30 seconds to scan a classpath. While the startup probe is failing, the other two probes are disabled. That is the whole point. Without it, a liveness probe would happily start killing your container before it ever finished booting, and you would have a pod stuck in a restart loop that never gets past initialization. Use the startup probe whenever your app needs more time to become usable than your liveness probe's tolerance allows.

The readiness probe is about traffic, not life. It controls whether the Service routes requests to this pod. A readiness failure means "do not send me traffic right now," not "kill me." Use it for backpressure: temporarily overloaded, a queue is full, a downstream is briefly unreachable, a config reload is in progress. The pod is fine, it just should not get new work. When the readiness probe passes again, traffic resumes without a restart.

The liveness probe is the nuclear option. It answers one question: is this pod broken in a way that only a restart will fix? A deadlocked thread pool. A wedged event loop. Memory in a state the process cannot recover from. When liveness fails, Kubernetes restarts the container. That is destructive and correct. It is also a footgun.

Here is the production failure I have seen more than once. A team writes a liveness probe that calls /healthz, and /healthz checks the database connection and Redis ping. Looks thorough. Looks like best practice. Then Redis does a four second failover during a routine cluster operation. Every pod's liveness probe fails at the same instant. Kubernetes does what it was told and starts restarting every pod across every node. The site is now in a rolling outage caused by a recovered dependency.

The rule that comes out of those incidents: a liveness probe should never depend on anything outside the pod itself. If the failure cannot be fixed by restarting this container, do not check it in the liveness probe. Check it in the readiness probe instead. The pod still gets pulled from rotation, but it is not killed, and the recovery happens on its own.

Startup gets your app to the starting line. Readiness decides when it can race. Liveness only fires when the runner is unconscious.

Key takeaway

Startup protects slow-boot apps, readiness gates traffic, liveness triggers restarts. Wire a liveness probe to a shared dependency and one blip restarts the entire fleet.

Originally posted on LinkedIn. View original.


All Rights Reserved.