Keras
Machine Learning
Multithreading
CPU Optimization
Deep Learning

Limit number of cores used in Keras

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Limiting CPU cores for Keras and TensorFlow is useful when running multiple workloads on shared machines, benchmarking fairly, or preventing noisy-neighbor effects in production services. Thread usage is controlled by more than one layer, so setting a single parameter is rarely enough. A stable setup aligns TensorFlow runtime threads, data pipeline parallelism, and backend library thread counts.

Core Sections

Understand where CPU threads come from

In a typical Keras training process, threads can come from:

  • TensorFlow intra-op parallelism
  • TensorFlow inter-op scheduling
  • tf.data map and prefetch workers
  • BLAS or OpenMP backend libraries such as MKL

If you only cap one layer, others may still consume many cores.

Set TensorFlow thread limits early

Set thread limits before creating models or datasets that initialize runtime internals.

python
1import tensorflow as tf
2
3# Must run early in process startup
4tf.config.threading.set_intra_op_parallelism_threads(2)
5tf.config.threading.set_inter_op_parallelism_threads(2)
6
7model = tf.keras.Sequential([
8    tf.keras.layers.Input(shape=(20,)),
9    tf.keras.layers.Dense(64, activation="relu"),
10    tf.keras.layers.Dense(1),
11])
12
13model.compile(optimizer="adam", loss="mse")

If these are set after heavy graph initialization, they may not fully apply.

Control tf.data parallelism intentionally

AUTOTUNE can exceed your intended CPU budget in constrained environments. Use explicit values when strict limits matter.

python
1import tensorflow as tf
2
3x = tf.random.normal((10000, 20))
4y = tf.random.normal((10000, 1))
5
6ds = tf.data.Dataset.from_tensor_slices((x, y))
7ds = ds.shuffle(1000)
8ds = ds.map(lambda a, b: (a, b), num_parallel_calls=2)
9ds = ds.batch(64).prefetch(1)

This keeps pipeline concurrency closer to your thread budget.

Coordinate with BLAS or OpenMP settings

Even with TensorFlow thread limits, numeric libraries may create additional threads. Set environment variables before Python starts.

bash
1export OMP_NUM_THREADS=2
2export MKL_NUM_THREADS=2
3export OPENBLAS_NUM_THREADS=2
4python train.py

This is especially relevant on CPU-heavy linear algebra workloads.

Verify real behavior, not only config values

Measure process thread count and host CPU usage under load. Config can look correct while runtime still oversubscribes.

python
1import psutil
2import os
3
4p = psutil.Process(os.getpid())
5print("threads:", p.num_threads())

Pair this with host tools such as top or container metrics to confirm effective limits.

Reproducible benchmarking setup

When comparing model variants, keep CPU limits fixed. Otherwise improvements may be from different thread scheduling rather than algorithm quality.

Recommended benchmark checklist:

  • pin thread settings
  • pin batch size
  • warm up model once
  • run multiple repetitions
  • report median and variance

This makes performance claims credible and repeatable.

Production deployment guidance

For inference services, combine process-level CPU limits and framework-level thread caps. Container CPU quotas alone do not guarantee predictable latency if thread pools are oversized.

For training clusters, keep default high parallelism for dedicated nodes and apply strict limits only where co-tenancy exists. One policy rarely fits all workloads. If you run experiments with hyperparameter search, apply the same core cap to each worker process. Otherwise one trial may starve others and create misleading comparisons between model configurations. Keep this policy enforced in job templates, not only local scripts.

For reproducible experiments, log every threading variable and TensorFlow thread setting with each run artifact so benchmark comparisons stay defensible.

Common Pitfalls

  • Setting TensorFlow thread caps after runtime initialization.
  • Using AUTOTUNE while expecting strict core boundaries.
  • Ignoring OpenMP or BLAS thread pools outside TensorFlow controls.
  • Treating container CPU quota as equivalent to thread-count control.
  • Benchmarking models with inconsistent threading configuration.

Summary

  • Limiting cores in Keras requires multi-layer configuration, not a single switch.
  • Set TensorFlow thread limits early and tune tf.data concurrency explicitly.
  • Control external numeric library thread pools for full effect.
  • Validate with runtime metrics, not only code settings.
  • Keep thread configuration fixed for fair benchmarks and predictable production behavior.

Course illustration
Course illustration

All Rights Reserved.