Kubernetes, simple SpringBoot app OOMKilled

Kubernetes

SpringBoot

OOMKilled

Application Scaling

Container Management

Kubernetes, simple SpringBoot app OOMKilled

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

OOMKilled in Kubernetes means the Linux kernel terminated your container after memory usage exceeded the pod limit. In Spring Boot services, this often happens because total JVM memory includes heap and non-heap regions, not just -Xmx. A durable fix combines Kubernetes resource sizing, JVM container tuning, and application-level memory discipline.

Confirm the OOM Signal First

Start with evidence, not assumptions. Check pod events, restart reason, and previous container logs.

bash

1kubectl get pods -n app
2kubectl describe pod my-service-abc123 -n app
3kubectl logs my-service-abc123 --previous -n app
4kubectl top pod my-service-abc123 -n app

Look for:

'Reason: OOMKilled'
restarts after traffic spikes
memory usage near or above limit before restart

If the process exits for another reason, JVM tuning alone will not fix it.

Understand Java Memory Inside Containers

A common mistake is setting heap too close to container limit. Java process memory includes:

heap
metaspace
thread stacks
JIT code cache
direct buffers and other native allocations

If pod limit is 768Mi and heap can grow near that value, non-heap overhead will push the process over the limit and trigger kill.

Set Requests and Limits Explicitly

Avoid relying on defaults. Define both requests and limits in deployment manifests.

yaml

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: my-service
5spec:
6  replicas: 2
7  template:
8    spec:
9      containers:
10        - name: app
11          image: myrepo/my-service:1.0.0
12          resources:
13            requests:
14              cpu: "250m"
15              memory: "512Mi"
16            limits:
17              cpu: "1000m"
18              memory: "768Mi"

Requests influence scheduling quality. Limits enforce the hard cap that the kernel uses.

Tune JVM for Container Budgets

Use container-aware JVM options so heap leaves space for non-heap memory.

yaml

1env:
2  - name: JAVA_TOOL_OPTIONS
3    value: >-
4      -XX:+UseContainerSupport
5      -XX:InitialRAMPercentage=30
6      -XX:MaxRAMPercentage=65
7      -XX:MaxMetaspaceSize=128m

An alternative is fixed heap sizing:

yaml

env:
  - name: JAVA_TOOL_OPTIONS
    value: "-Xms256m -Xmx384m -XX:MaxMetaspaceSize=128m"

Choose one strategy and measure under realistic load. For simple services, percentage-based limits are often easier to maintain across environments.

Reduce Application Memory Pressure

Sometimes the app itself drives memory spikes. Common Spring Boot causes:

large JSON payload buffering
unbounded caches
loading entire datasets into memory
high thread counts
expensive object mapping on hot paths

Practical mitigations:

stream large responses where possible
configure cache size and eviction policy
process batch jobs in chunks
tune Tomcat thread pool for expected concurrency

A small code and config adjustment can eliminate kills without increasing pod limits.

Add Observability Before and After Changes

Without metrics, tuning is guesswork. Track at minimum:

JVM heap used and max
non-heap used
GC pause duration and frequency
container memory working set
pod restart count

Example with Spring Boot Actuator and Prometheus:

properties

management.endpoints.web.exposure.include=health,info,prometheus
management.metrics.tags.application=my-service

After each change, compare restart frequency and memory headroom over similar traffic windows.

Diagnose Leak Versus Bad Sizing

Not every OOM is a memory leak. Leak patterns usually show monotonic growth with poor recovery after GC. Bad sizing patterns often show oscillation near the limit and kills during bursts.

If a leak is suspected:

capture heap dump on OOM
inspect dominant object retainers
check cache eviction behavior

If sizing is the issue, adjust pod limit and JVM percentages with load-test feedback.

Rollout Strategy

Apply memory changes gradually. Use canary or one replica first, observe for one traffic cycle, then roll out.

bash

kubectl rollout restart deployment/my-service -n app
kubectl rollout status deployment/my-service -n app

Fast global rollout of untested memory settings can turn a partial problem into full outage.

Common Pitfalls

Setting heap too high relative to container memory limit.
Increasing pod limits without checking application memory hotspots.
Diagnosing OOM from current logs only and missing previous-container evidence.
Ignoring non-heap memory when estimating Java process footprint.
Rolling memory config changes to all replicas without staged validation.

Summary

'OOMKilled is a container memory cap event, not just a JVM exception.'
Size Kubernetes requests and limits explicitly for your workload.
Leave headroom for non-heap memory when tuning JVM options.
Reduce app-level memory spikes through streaming, bounded caches, and batch chunking.
Validate each change with metrics and staged rollout before full deployment.