Kubernetes
Restart Policy
Container Orchestration
Pod Management
DevOps

configuring kubernetes restart policy

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Kubernetes restart policy controls what the kubelet does when a container in a Pod exits. The confusing part is that Pod-level restart policy and controller behavior are related but not identical: a container restart inside one Pod is different from a Deployment or Job deciding whether to create or keep Pods.

The Three Restart Policies

At the Pod spec level, Kubernetes supports three values:

  • 'Always'
  • 'OnFailure'
  • 'Never'

They mean:

  • 'Always: restart the container whenever it exits'
  • 'OnFailure: restart only when the exit status is nonzero'
  • 'Never: do not restart the container inside that Pod'

A simple Pod example:

yaml
1apiVersion: v1
2kind: Pod
3metadata:
4  name: demo-pod
5spec:
6  restartPolicy: OnFailure
7  containers:
8    - name: worker
9      image: busybox
10      command: ["sh", "-c", "exit 1"]

With OnFailure, the kubelet keeps restarting the container because it exits with failure.

Pod Restarts Versus Controller Replacement

This is the distinction many people miss.

restartPolicy tells the kubelet what to do with containers inside the same Pod. Controllers such as Deployments, StatefulSets, Jobs, and CronJobs decide whether Pods themselves should exist and how many of them there should be.

So there are two layers:

  • kubelet restarts containers inside a Pod
  • controllers create, replace, or stop Pods according to higher-level rules

That is why restart behavior can look different depending on whether you are running a naked Pod, a Job, or a Deployment.

Deployments Usually Mean Always

For workloads managed by a Deployment, the Pod template uses restartPolicy: Always. That is the normal setting for long-running services.

yaml
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4  name: web
5spec:
6  replicas: 2
7  selector:
8    matchLabels:
9      app: web
10  template:
11    metadata:
12      labels:
13        app: web
14    spec:
15      restartPolicy: Always
16      containers:
17        - name: web
18          image: nginx:1.27

If the container crashes, Kubernetes restarts it inside the Pod. If the Pod disappears entirely, the Deployment creates a replacement Pod to maintain the desired replica count.

Jobs and OnFailure or Never

Batch workloads are different. A Job is meant to run to completion, so OnFailure or Never is more appropriate.

yaml
1apiVersion: batch/v1
2kind: Job
3metadata:
4  name: report-job
5spec:
6  template:
7    spec:
8      restartPolicy: OnFailure
9      containers:
10        - name: report
11          image: busybox
12          command: ["sh", "-c", "echo generating report && exit 0"]

With Jobs:

  • 'OnFailure restarts failed containers inside the Pod'
  • 'Never leaves the failed Pod as-is and lets the Job controller decide about new Pods'

This is useful when you want failed batch work to be visible and analyzable rather than repeatedly retried in place.

How Crash Loops Relate to Restart Policy

If a container repeatedly exits under Always or OnFailure, Kubernetes backs off between restarts and the Pod may show CrashLoopBackOff.

That status is not itself a restart policy. It is the observable result of a failing container being restarted repeatedly under a policy that allows restarts.

So when you see CrashLoopBackOff, the real debugging question is usually:

  • why is the container exiting?

not:

  • how do I get Kubernetes to stop reporting the symptom?

Choosing the Right Policy

A practical rule is:

  • long-running services: Always
  • retryable batch work: OnFailure
  • one-shot runs where you want the terminated state preserved: Never

Do not choose based only on whether restarting sounds nice. Choose based on workload intent.

Common Pitfalls

The biggest mistake is assuming restartPolicy controls everything about workload recovery. It only governs container restarts within a Pod; controllers add another layer of behavior.

Another mistake is using service-style thinking for Jobs. A batch task that should finish successfully is not the same as a web server that should run forever.

People also confuse CrashLoopBackOff with a configurable restart policy value. It is a status that appears when the actual process inside the container keeps failing.

Finally, avoid setting Never just to stop restarts while debugging and then forgetting to restore the correct production behavior. That can leave a service dead after its first real crash.

Summary

  • Kubernetes restart policy applies at the Pod level and controls container restarts.
  • 'Always, OnFailure, and Never are the three valid values.'
  • Deployments typically use Always for long-running services.
  • Jobs often use OnFailure or Never for batch semantics.
  • Controller behavior and Pod restart behavior are related, but they are not the same thing.

Course illustration
Course illustration

All Rights Reserved.