Limit number of cores used in Keras

Keras

Machine Learning

Multithreading

CPU Optimization

Deep Learning

Limit number of cores used in Keras

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Limiting CPU cores for Keras and TensorFlow is useful when running multiple workloads on shared machines, benchmarking fairly, or preventing noisy-neighbor effects in production services. Thread usage is controlled by more than one layer, so setting a single parameter is rarely enough. A stable setup aligns TensorFlow runtime threads, data pipeline parallelism, and backend library thread counts.

Core Sections

Understand where CPU threads come from

In a typical Keras training process, threads can come from:

TensorFlow intra-op parallelism
TensorFlow inter-op scheduling
tf.data map and prefetch workers
BLAS or OpenMP backend libraries such as MKL

If you only cap one layer, others may still consume many cores.

Set TensorFlow thread limits early

Set thread limits before creating models or datasets that initialize runtime internals.

python

1import tensorflow as tf
2
3# Must run early in process startup
4tf.config.threading.set_intra_op_parallelism_threads(2)
5tf.config.threading.set_inter_op_parallelism_threads(2)
6
7model = tf.keras.Sequential([
8    tf.keras.layers.Input(shape=(20,)),
9    tf.keras.layers.Dense(64, activation="relu"),
10    tf.keras.layers.Dense(1),
11])
12
13model.compile(optimizer="adam", loss="mse")

If these are set after heavy graph initialization, they may not fully apply.

Control `tf.data` parallelism intentionally

AUTOTUNE can exceed your intended CPU budget in constrained environments. Use explicit values when strict limits matter.

python

1import tensorflow as tf
2
3x = tf.random.normal((10000, 20))
4y = tf.random.normal((10000, 1))
5
6ds = tf.data.Dataset.from_tensor_slices((x, y))
7ds = ds.shuffle(1000)
8ds = ds.map(lambda a, b: (a, b), num_parallel_calls=2)
9ds = ds.batch(64).prefetch(1)

This keeps pipeline concurrency closer to your thread budget.

Coordinate with BLAS or OpenMP settings

Even with TensorFlow thread limits, numeric libraries may create additional threads. Set environment variables before Python starts.

bash

1export OMP_NUM_THREADS=2
2export MKL_NUM_THREADS=2
3export OPENBLAS_NUM_THREADS=2
4python train.py

This is especially relevant on CPU-heavy linear algebra workloads.

Verify real behavior, not only config values

Measure process thread count and host CPU usage under load. Config can look correct while runtime still oversubscribes.

python

1import psutil
2import os
3
4p = psutil.Process(os.getpid())
5print("threads:", p.num_threads())

Pair this with host tools such as top or container metrics to confirm effective limits.

Reproducible benchmarking setup

When comparing model variants, keep CPU limits fixed. Otherwise improvements may be from different thread scheduling rather than algorithm quality.

Recommended benchmark checklist:

pin thread settings
pin batch size
warm up model once
run multiple repetitions
report median and variance

This makes performance claims credible and repeatable.

Production deployment guidance

For inference services, combine process-level CPU limits and framework-level thread caps. Container CPU quotas alone do not guarantee predictable latency if thread pools are oversized.

For training clusters, keep default high parallelism for dedicated nodes and apply strict limits only where co-tenancy exists. One policy rarely fits all workloads. If you run experiments with hyperparameter search, apply the same core cap to each worker process. Otherwise one trial may starve others and create misleading comparisons between model configurations. Keep this policy enforced in job templates, not only local scripts.

For reproducible experiments, log every threading variable and TensorFlow thread setting with each run artifact so benchmark comparisons stay defensible.

Common Pitfalls

Setting TensorFlow thread caps after runtime initialization.
Using AUTOTUNE while expecting strict core boundaries.
Ignoring OpenMP or BLAS thread pools outside TensorFlow controls.
Treating container CPU quota as equivalent to thread-count control.
Benchmarking models with inconsistent threading configuration.

Summary

Limiting cores in Keras requires multi-layer configuration, not a single switch.
Set TensorFlow thread limits early and tune tf.data concurrency explicitly.
Control external numeric library thread pools for full effect.
Validate with runtime metrics, not only code settings.
Keep thread configuration fixed for fair benchmarks and predictable production behavior.

Limit number of cores used in Keras

Master System Design with Codemia

Introduction

Core Sections

Understand where CPU threads come from

Set TensorFlow thread limits early

Control tf.data parallelism intentionally

Coordinate with BLAS or OpenMP settings

Verify real behavior, not only config values

Reproducible benchmarking setup

Production deployment guidance

Common Pitfalls

Summary

Control `tf.data` parallelism intentionally