Kubernetes
Pod
CrashLoopBackOff
Troubleshooting
Container

Kubernetes Pod fails with CrashLoopBackOff

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

CrashLoopBackOff means Kubernetes started a container, the container exited, and Kubernetes is backing off before trying again. The backoff is a symptom, not the root cause. To fix it, you need to find why the process inside the container keeps terminating.

Start with Logs and Events

The first commands should be:

bash
kubectl describe pod my-pod
kubectl logs my-pod
kubectl logs my-pod --previous

describe shows events, restart counts, probe failures, and exit reasons. logs --previous is especially important because the current container instance may be too new to show the crash that triggered the restart.

Check the Container Exit Reason

Common patterns include:

  • the application process throws an exception and exits
  • the command or entrypoint is wrong
  • a required file, secret, or environment variable is missing
  • the liveness probe kills the container repeatedly
  • the process is OOM-killed because memory limits are too low

kubectl describe pod usually shows clues such as OOMKilled, probe failures, or non-zero exit codes.

Example of a Probe-Induced Crash Loop

A misconfigured liveness probe can restart a healthy process before it finishes startup.

yaml
1livenessProbe:
2  httpGet:
3    path: /health
4    port: 8080
5  initialDelaySeconds: 2
6  periodSeconds: 5
7  timeoutSeconds: 1

If the application needs 20 seconds to boot, this probe is too aggressive. A safer version might be:

yaml
1startupProbe:
2  httpGet:
3    path: /health
4    port: 8080
5  failureThreshold: 30
6  periodSeconds: 2
7
8livenessProbe:
9  httpGet:
10    path: /health
11    port: 8080
12  initialDelaySeconds: 5
13  periodSeconds: 10

startupProbe prevents premature liveness failures during slow startup.

Verify the Command, Image, and Config

If the container exits instantly, inspect:

  • 'command and args'
  • image tag correctness
  • mounted config files and secrets
  • required environment variables
  • application working directory assumptions

A pod can crash forever because of one missing environment variable.

yaml
1env:
2  - name: DATABASE_URL
3    valueFrom:
4      secretKeyRef:
5        name: app-secret
6        key: database_url

If app-secret or database_url is wrong, the app may fail before it ever serves traffic.

Watch for Resource Problems

If the pod is killed with OOMKilled, raise the memory limit, reduce memory usage, or both.

yaml
1resources:
2  requests:
3    memory: "256Mi"
4  limits:
5    memory: "512Mi"

Do not guess here. Check metrics and application behavior. If Java, Node.js, or Python workloads are close to the memory limit, brief spikes can trigger repeated crashes.

Use kubectl exec Only After the Container Stays Up Long Enough

Sometimes the natural instinct is to kubectl exec into the pod and inspect the filesystem or environment. That is useful only if the container remains alive long enough to attach. If it crashes instantly, use logs, pod events, and a debug container or a temporary command override instead of waiting for a shell that never becomes available.

That distinction saves time during incident response.

Common Pitfalls

  • Treating CrashLoopBackOff as the root cause instead of the restart symptom.
  • Looking only at current logs and missing kubectl logs --previous.
  • Misconfiguring liveness probes for slow-starting apps.
  • Ignoring OOMKilled and focusing only on application logs.
  • Shipping an image with the wrong entrypoint or missing runtime dependency.
  • Assuming Kubernetes is broken when the real problem is application exit behavior.

Summary

  • 'CrashLoopBackOff means the container keeps exiting and Kubernetes is backing off.'
  • Start with describe, current logs, and previous logs.
  • Check exit reasons, probes, env vars, secrets, and entrypoints.
  • Use startupProbe when boot time is longer than liveness timing.
  • Fix the process exit cause, not the backoff symptom.

Course illustration
Course illustration

All Rights Reserved.