Kubernetes
Liveness Probe
Pod Running State
Troubleshooting
Container Management

Kubernetes , liveness probe is failing but pod in Running state

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A pod can stay in Running even while its liveness probe is failing because pod phase and container health are not the same thing. Running only means the pod has been scheduled and at least one container is active or repeatedly being restarted. A failing liveness probe usually causes container restarts inside that running pod, not an immediate pod phase change to Failed.

Understand What Liveness Actually Controls

A liveness probe asks one question: should Kubernetes restart this container because it appears unhealthy?

If the liveness probe fails enough times to cross failureThreshold, kubelet kills the container and restarts it according to the pod's restart policy. For a typical Deployment-managed pod, the restart policy is Always, so the container restarts inside the same pod object.

That is why the pod often remains Running while the container repeatedly restarts.

Pod Phase Versus Container State

This distinction is the source of most confusion.

  • pod phase Running: the pod is active on a node
  • container state: the individual container may be running, waiting, terminated, or restarting
  • liveness probe: a health signal that can trigger a restart of the container

So it is completely possible to have:

  • pod phase = Running
  • liveness probe failures in events
  • restart count increasing

Check the Real Symptoms

Use kubectl describe to inspect probe failures and restart activity.

bash
kubectl describe pod my-pod
kubectl get pod my-pod -o wide
kubectl logs my-pod --previous

The --previous logs are especially useful after a liveness-triggered restart because they show the logs from the last container instance that died.

Also inspect the restart count:

bash
kubectl get pod my-pod

If RESTARTS keeps increasing, the liveness probe is actively causing container restarts even though the pod object still shows Running.

Example Probe Configuration

yaml
1livenessProbe:
2  httpGet:
3    path: /healthz
4    port: 8080
5  initialDelaySeconds: 30
6  periodSeconds: 10
7  timeoutSeconds: 2
8  failureThreshold: 3

This means Kubernetes starts probing after 30 seconds, probes every 10 seconds, waits 2 seconds for each response, and restarts the container after 3 consecutive failures.

If the application needs more startup time than you configured, the probe can kill a container that was still initializing normally.

Common Reasons This Happens

The usual causes are:

  • the liveness probe starts too early
  • the endpoint or command is too strict
  • timeouts are too short
  • the app is healthy enough to start but occasionally slow under load
  • a startup probe should have been used instead of an aggressive liveness probe

A slow-starting app is a classic case. If kubelet probes too early, it may create a restart loop even though the process could have become healthy given more time.

Fix the Probe, Not Just the Pod Status

A good troubleshooting sequence is:

  1. inspect pod events and restart count
  2. read current and previous container logs
  3. test the probe command or HTTP endpoint manually
  4. increase initialDelaySeconds or use a startupProbe if startup is slow
  5. relax timeoutSeconds or failureThreshold if the app is intermittently slow but not deadlocked

The goal is not to make probe failures disappear cosmetically. The goal is to align the probe with real application behavior.

Common Pitfalls

  • Assuming Running means the container is healthy and stable.
  • Looking only at pod phase and ignoring restart count.
  • Making the liveness probe check an endpoint that is too expensive or fragile.
  • Using a liveness probe to solve a slow-start problem that should use startupProbe.
  • Reading only current logs and missing the --previous logs from the restarted container.

Summary

  • A pod can remain Running while liveness probe failures restart its container.
  • Pod phase and container health are different layers of state.
  • Use kubectl describe pod, restart count, and kubectl logs --previous to see what is happening.
  • Many liveness failures are probe-configuration problems rather than true crashes.
  • Tune startup, timeout, and failure thresholds so the probe reflects real application health.

Course illustration
Course illustration

All Rights Reserved.