Kubernetes
SpringBoot
OOMKilled
Application Scaling
Container Management

Kubernetes, simple SpringBoot app OOMKilled

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

OOMKilled in Kubernetes means the Linux kernel terminated your container after memory usage exceeded the pod limit. In Spring Boot services, this often happens because total JVM memory includes heap and non-heap regions, not just -Xmx. A durable fix combines Kubernetes resource sizing, JVM container tuning, and application-level memory discipline.

Confirm the OOM Signal First

Start with evidence, not assumptions. Check pod events, restart reason, and previous container logs.

bash
1kubectl get pods -n app
2kubectl describe pod my-service-abc123 -n app
3kubectl logs my-service-abc123 --previous -n app
4kubectl top pod my-service-abc123 -n app

Look for:

  • 'Reason: OOMKilled'
  • restarts after traffic spikes
  • memory usage near or above limit before restart

If the process exits for another reason, JVM tuning alone will not fix it.

Understand Java Memory Inside Containers

A common mistake is setting heap too close to container limit. Java process memory includes:

  • heap
  • metaspace
  • thread stacks
  • JIT code cache
  • direct buffers and other native allocations

If pod limit is 768Mi and heap can grow near that value, non-heap overhead will push the process over the limit and trigger kill.

Set Requests and Limits Explicitly

Avoid relying on defaults. Define both requests and limits in deployment manifests.

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: my-service
5spec:
6  replicas: 2
7  template:
8    spec:
9      containers:
10        - name: app
11          image: myrepo/my-service:1.0.0
12          resources:
13            requests:
14              cpu: "250m"
15              memory: "512Mi"
16            limits:
17              cpu: "1000m"
18              memory: "768Mi"

Requests influence scheduling quality. Limits enforce the hard cap that the kernel uses.

Tune JVM for Container Budgets

Use container-aware JVM options so heap leaves space for non-heap memory.

yaml
1env:
2  - name: JAVA_TOOL_OPTIONS
3    value: >-
4      -XX:+UseContainerSupport
5      -XX:InitialRAMPercentage=30
6      -XX:MaxRAMPercentage=65
7      -XX:MaxMetaspaceSize=128m

An alternative is fixed heap sizing:

yaml
env:
  - name: JAVA_TOOL_OPTIONS
    value: "-Xms256m -Xmx384m -XX:MaxMetaspaceSize=128m"

Choose one strategy and measure under realistic load. For simple services, percentage-based limits are often easier to maintain across environments.

Reduce Application Memory Pressure

Sometimes the app itself drives memory spikes. Common Spring Boot causes:

  • large JSON payload buffering
  • unbounded caches
  • loading entire datasets into memory
  • high thread counts
  • expensive object mapping on hot paths

Practical mitigations:

  • stream large responses where possible
  • configure cache size and eviction policy
  • process batch jobs in chunks
  • tune Tomcat thread pool for expected concurrency

A small code and config adjustment can eliminate kills without increasing pod limits.

Add Observability Before and After Changes

Without metrics, tuning is guesswork. Track at minimum:

  • JVM heap used and max
  • non-heap used
  • GC pause duration and frequency
  • container memory working set
  • pod restart count

Example with Spring Boot Actuator and Prometheus:

properties
management.endpoints.web.exposure.include=health,info,prometheus
management.metrics.tags.application=my-service

After each change, compare restart frequency and memory headroom over similar traffic windows.

Diagnose Leak Versus Bad Sizing

Not every OOM is a memory leak. Leak patterns usually show monotonic growth with poor recovery after GC. Bad sizing patterns often show oscillation near the limit and kills during bursts.

If a leak is suspected:

  • capture heap dump on OOM
  • inspect dominant object retainers
  • check cache eviction behavior

If sizing is the issue, adjust pod limit and JVM percentages with load-test feedback.

Rollout Strategy

Apply memory changes gradually. Use canary or one replica first, observe for one traffic cycle, then roll out.

bash
kubectl rollout restart deployment/my-service -n app
kubectl rollout status deployment/my-service -n app

Fast global rollout of untested memory settings can turn a partial problem into full outage.

Common Pitfalls

  • Setting heap too high relative to container memory limit.
  • Increasing pod limits without checking application memory hotspots.
  • Diagnosing OOM from current logs only and missing previous-container evidence.
  • Ignoring non-heap memory when estimating Java process footprint.
  • Rolling memory config changes to all replicas without staged validation.

Summary

  • 'OOMKilled is a container memory cap event, not just a JVM exception.'
  • Size Kubernetes requests and limits explicitly for your workload.
  • Leave headroom for non-heap memory when tuning JVM options.
  • Reduce app-level memory spikes through streaming, bounded caches, and batch chunking.
  • Validate each change with metrics and staged rollout before full deployment.

Course illustration
Course illustration

All Rights Reserved.