Kubernetes
Pod Crash
Database Issues
Container Management
Troubleshooting

Kubernetes - Pod which encapsulates DB is crashing

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When a Kubernetes pod that runs a database keeps crashing, the right question is not "Why is Kubernetes unstable?" but "What is the database process or its environment doing at startup?" Crash loops around stateful workloads usually come from storage, permissions, resource limits, probes, or database configuration mismatches.

Databases are more sensitive than stateless web containers because they care about durable storage, startup ordering, write permissions, and clean shutdown. That is why the debugging approach should be more like infrastructure diagnosis than like ordinary app log inspection.

Start with the Fastest Signals

The first commands should be:

bash
1kubectl get pods
2kubectl describe pod mydb-0
3kubectl logs mydb-0
4kubectl logs mydb-0 --previous

These tell you:

  • whether the pod is in CrashLoopBackOff,
  • whether it was OOM-killed,
  • whether a probe is failing,
  • what the database process logged before exiting.

For stateful workloads, --previous is especially useful because the current container may restart too quickly to capture the original failure.

Common Root Causes

The most common causes of database pod crashes are:

  • missing or corrupted persistent volume data,
  • filesystem permission problems on the mounted volume,
  • resource limits that are too small,
  • bad startup flags or environment variables,
  • liveness probes that kill the database before it is ready.

These show up differently in logs, but they are the first places to look.

Storage and Permissions

Many database images need write access to a specific data directory. If the mounted volume is owned by the wrong user or mounted read-only, the database may fail immediately.

Typical symptom:

  • the pod starts,
  • the entrypoint tries to initialize or open the data directory,
  • the process exits with a permissions or filesystem error.

That is why securityContext, file ownership, and PVC health matter so much for stateful pods.

Resource Limits and OOMKills

Databases are memory-hungry. If the container memory limit is too small, Kubernetes may terminate it with an OOM kill.

Check the pod description for signals like:

  • 'OOMKilled,'
  • restart count climbing rapidly,
  • termination reason pointing to memory pressure.

A database that needs time to warm caches or replay logs can also look unhealthy if CPU limits are too strict and startup becomes too slow.

Probe Configuration

Bad liveness probes are a classic cause of database crash loops. If the database needs 40 seconds to initialize but the liveness probe starts killing it after 10, the pod never gets a chance to become healthy.

A safer pattern is:

  • use a generous startup probe for slow database initialization,
  • use readiness to gate traffic,
  • keep liveness conservative.

That way Kubernetes does not mistake "still starting" for "broken forever."

StatefulSet and Volume Design

For real databases, a StatefulSet is usually more appropriate than a plain Deployment. It gives stable identity and stable storage attachments, which match database expectations much better.

A simple debugging rule is: if your database pod is attached to persistent data, inspect the PVC and StatefulSet behavior just as seriously as the container logs.

Common Pitfalls

  • Looking only at kubectl get pods and never checking logs or pod events.
  • Running a database in a plain Deployment without thinking through stable storage needs.
  • Using aggressive liveness probes that kill slow startup sequences.
  • Forgetting volume permissions and ownership requirements.
  • Treating a database container like a stateless app and underallocating memory or disk.

Summary

  • A crashing database pod is usually failing because of storage, permissions, probes, resources, or startup configuration.
  • Start with describe, current logs, and previous logs.
  • Check for OOM kills, PVC issues, and filesystem access problems early.
  • Probe timing is critical for slow-starting databases.
  • Stateful databases fit StatefulSet and persistent-volume patterns better than generic stateless deployment patterns.

Course illustration
Course illustration

All Rights Reserved.