CUDA_ERROR_OUT_OF_MEMORY in tensorflow

CUDA

TensorFlow

Out of Memory

Deep Learning

GPU

CUDA_ERROR_OUT_OF_MEMORY in tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

CUDA_ERROR_OUT_OF_MEMORY in TensorFlow means the runtime could not allocate enough GPU memory for an operation, model, or batch. The immediate fix is often to reduce memory pressure, but the durable fix is understanding where that memory is going and configuring TensorFlow so it behaves well on your hardware.

Why It Happens

GPU memory is consumed by several things at once:

model parameters
optimizer state
activations kept for backpropagation
input batches
temporary tensors created during ops

That is why seemingly small changes such as doubling batch size or input resolution can push a run from stable to failing.

In TensorFlow, the corresponding Python-side exception is often tf.errors.ResourceExhaustedError, even though the underlying CUDA message mentions out-of-memory conditions.

Start with the Simplest Fixes

The first things to try are usually the most effective:

reduce batch size
reduce input resolution
use a smaller model
stop other GPU workloads

Example with a smaller batch size:

python

1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Input(shape=(784,)),
5    tf.keras.layers.Dense(512, activation="relu"),
6    tf.keras.layers.Dense(10)
7])
8
9model.compile(
10    optimizer="adam",
11    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
12)
13
14x = tf.random.normal((1024, 784))
15y = tf.random.uniform((1024,), maxval=10, dtype=tf.int32)
16
17model.fit(x, y, batch_size=32, epochs=1)

If batch_size=256 fails and batch_size=32 works, you have confirmed the main source of pressure.

Enable Memory Growth When Appropriate

By default, TensorFlow may reserve GPU memory aggressively. On development machines, enabling memory growth can make TensorFlow allocate memory more gradually.

python

1import tensorflow as tf
2
3gpus = tf.config.list_physical_devices("GPU")
4for gpu in gpus:
5    tf.config.experimental.set_memory_growth(gpu, True)

This does not create more memory, but it often improves coexistence with other processes and reduces the feeling that TensorFlow grabbed the whole GPU immediately.

Mixed Precision Can Help

If your hardware supports it, mixed precision can reduce memory usage and often improve throughput.

python

1import tensorflow as tf
2from tensorflow.keras import mixed_precision
3
4mixed_precision.set_global_policy("mixed_float16")
5
6model = tf.keras.Sequential([
7    tf.keras.layers.Input(shape=(784,)),
8    tf.keras.layers.Dense(512, activation="relu"),
9    tf.keras.layers.Dense(10)
10])

This is not a universal fix, but it is a common next step after reducing obvious memory costs.

Clear State Between Experiments

Notebook sessions and repeated model creation can leak memory in practice because old graphs and variables remain alive longer than expected. Clearing the Keras backend between runs helps:

python

import tensorflow as tf

tf.keras.backend.clear_session()

If you are iterating in Jupyter or a long-running Python process, this matters more than many people expect.

When the Real Problem Is the Workflow

Sometimes the issue is not the single training step but the training design:

very large validation batches
caching huge datasets on GPU unintentionally
keeping multiple models alive at once
loading a model and then cloning another copy

So do not stop at "reduce batch size." Look at the full lifecycle of the process.

Common Pitfalls

Assuming the model weights are the only big memory cost and forgetting activations or optimizer state.
Restarting the same notebook cell repeatedly without clearing old TensorFlow state.
Treating memory growth as a cure-all when the model simply does not fit.
Ignoring other processes already using the GPU.
Reducing training batch size but leaving validation or prediction batch sizes huge.

Summary

TensorFlow GPU OOM errors usually come from batch size, model size, activations, or competing workloads.
Start by reducing the easiest pressure sources such as batch size and input size.
Enable memory growth if you want TensorFlow to allocate GPU memory more gradually.
Mixed precision can reduce memory usage on supported hardware.
Clear TensorFlow state between experiments and inspect the whole workflow, not just one layer.