Kubernetes , liveness probe is failing but pod in Running state
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A pod can stay in Running even while its liveness probe is failing because pod phase and container health are not the same thing. Running only means the pod has been scheduled and at least one container is active or repeatedly being restarted. A failing liveness probe usually causes container restarts inside that running pod, not an immediate pod phase change to Failed.
Understand What Liveness Actually Controls
A liveness probe asks one question: should Kubernetes restart this container because it appears unhealthy?
If the liveness probe fails enough times to cross failureThreshold, kubelet kills the container and restarts it according to the pod's restart policy. For a typical Deployment-managed pod, the restart policy is Always, so the container restarts inside the same pod object.
That is why the pod often remains Running while the container repeatedly restarts.
Pod Phase Versus Container State
This distinction is the source of most confusion.
- pod phase
Running: the pod is active on a node - container state: the individual container may be running, waiting, terminated, or restarting
- liveness probe: a health signal that can trigger a restart of the container
So it is completely possible to have:
- pod phase =
Running - liveness probe failures in events
- restart count increasing
Check the Real Symptoms
Use kubectl describe to inspect probe failures and restart activity.
The --previous logs are especially useful after a liveness-triggered restart because they show the logs from the last container instance that died.
Also inspect the restart count:
If RESTARTS keeps increasing, the liveness probe is actively causing container restarts even though the pod object still shows Running.
Example Probe Configuration
This means Kubernetes starts probing after 30 seconds, probes every 10 seconds, waits 2 seconds for each response, and restarts the container after 3 consecutive failures.
If the application needs more startup time than you configured, the probe can kill a container that was still initializing normally.
Common Reasons This Happens
The usual causes are:
- the liveness probe starts too early
- the endpoint or command is too strict
- timeouts are too short
- the app is healthy enough to start but occasionally slow under load
- a startup probe should have been used instead of an aggressive liveness probe
A slow-starting app is a classic case. If kubelet probes too early, it may create a restart loop even though the process could have become healthy given more time.
Fix the Probe, Not Just the Pod Status
A good troubleshooting sequence is:
- inspect pod events and restart count
- read current and previous container logs
- test the probe command or HTTP endpoint manually
- increase
initialDelaySecondsor use astartupProbeif startup is slow - relax
timeoutSecondsorfailureThresholdif the app is intermittently slow but not deadlocked
The goal is not to make probe failures disappear cosmetically. The goal is to align the probe with real application behavior.
Common Pitfalls
- Assuming
Runningmeans the container is healthy and stable. - Looking only at pod phase and ignoring restart count.
- Making the liveness probe check an endpoint that is too expensive or fragile.
- Using a liveness probe to solve a slow-start problem that should use
startupProbe. - Reading only current logs and missing the
--previouslogs from the restarted container.
Summary
- A pod can remain
Runningwhile liveness probe failures restart its container. - Pod phase and container health are different layers of state.
- Use
kubectl describe pod, restart count, andkubectl logs --previousto see what is happening. - Many liveness failures are probe-configuration problems rather than true crashes.
- Tune startup, timeout, and failure thresholds so the probe reflects real application health.

