tensorflow
reproducible-results
tf.set_random_seed
machine-learning
deep-learning

Reproducible results in Tensorflow with tf.set_random_seed

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Getting reproducible results in TensorFlow means controlling more than one source of randomness. The old TensorFlow 1 API used tf.set_random_seed, while modern TensorFlow uses tf.random.set_seed, and fully repeatable experiments also need Python, NumPy, data-pipeline, and operation-determinism settings aligned.

Understand What the Seed Actually Controls

TensorFlow random operations combine a global seed with optional operation-level seeds. Setting the global seed makes random behavior deterministic within the same code path and TensorFlow version, but it does not magically eliminate every source of variation.

In TensorFlow 1 code, you may see:

python
import tensorflow as tf

tf.compat.v1.set_random_seed(1234)

In TensorFlow 2, the normal API is:

python
import tensorflow as tf

tf.random.set_seed(1234)

That handles TensorFlow's own random ops, but reproducibility still depends on what else the program is doing.

Set Python, NumPy, and TensorFlow Seeds Together

If your program uses Python's random, NumPy, and TensorFlow, set all three. Keras provides a helper for exactly this use case.

python
1import keras
2import tensorflow as tf
3
4keras.utils.set_random_seed(1234)
5
6model = tf.keras.Sequential([
7    tf.keras.layers.Input(shape=(4,)),
8    tf.keras.layers.Dense(8, activation="relu"),
9    tf.keras.layers.Dropout(0.2),
10    tf.keras.layers.Dense(1)
11])
12
13print(model(tf.ones((1, 4)), training=True))

keras.utils.set_random_seed sets Python, NumPy, and TensorFlow seeds together, which is much easier than remembering each one individually. It is the best default in current Keras-based workflows.

Deterministic Ops Matter Too

Even with the same seed, some operations can still behave non-deterministically depending on hardware and kernel choice. TensorFlow exposes an option to request deterministic operation behavior.

python
1import keras
2import tensorflow as tf
3
4keras.utils.set_random_seed(1234)
5tf.config.experimental.enable_op_determinism()

This reduces surprises, especially on GPU, but it may come with a performance cost. It is worth enabling when exact repeatability matters more than raw speed.

Reproducible Training Example

Here is a compact example that trains the same small model twice with controlled randomness.

python
1import keras
2import numpy as np
3import tensorflow as tf
4
5def train_once():
6    keras.utils.set_random_seed(7)
7    tf.config.experimental.enable_op_determinism()
8
9    x = np.random.randn(100, 4).astype("float32")
10    y = (x.sum(axis=1) > 0).astype("float32")
11
12    model = tf.keras.Sequential([
13        tf.keras.layers.Input(shape=(4,)),
14        tf.keras.layers.Dense(8, activation="relu"),
15        tf.keras.layers.Dense(1, activation="sigmoid")
16    ])
17
18    model.compile(optimizer="adam", loss="binary_crossentropy")
19    history = model.fit(x, y, epochs=3, batch_size=16, shuffle=False, verbose=0)
20    return history.history["loss"]
21
22
23print(train_once())
24print(train_once())

Notice the use of shuffle=False. If data order changes across runs, the same model and seed can still drift because the training trajectory changes.

Why Reproducibility Can Still Drift

Exact repeatability is sensitive to more than just seeds. Common causes of drift include:

  • different TensorFlow versions
  • different hardware or driver stacks
  • parallel data loading order
  • non-deterministic GPU kernels
  • distributed training or multi-host communication

The TensorFlow docs explicitly note that some sequences may change across framework versions even when a global seed is set. So reproducibility should always be interpreted in the context of a fixed environment, not just fixed code.

Common Pitfalls

One common mistake is setting only the TensorFlow seed and forgetting that NumPy or Python random utilities also influence the experiment. That creates a false sense of control.

Another is expecting a seed alone to make GPU execution perfectly deterministic. If deterministic ops are not enabled, some kernels may still vary.

Developers also often overlook data order. A shuffled dataset, parallel prefetching, or non-deterministic file iteration can change training results even when the model initialization is stable.

Finally, remember that the old tf.set_random_seed name belongs to older TensorFlow code. In current TensorFlow 2 code, use tf.random.set_seed or keras.utils.set_random_seed.

Summary

  • 'tf.set_random_seed is the older TensorFlow 1 style; modern TensorFlow uses tf.random.set_seed.'
  • For Keras workflows, keras.utils.set_random_seed is the easiest way to seed Python, NumPy, and TensorFlow together.
  • Deterministic operations may also need tf.config.experimental.enable_op_determinism().
  • Reproducibility depends on data order, environment, and TensorFlow version, not just one seed call.
  • Treat seeds as one part of experiment control, not the entire solution.

Course illustration
Course illustration

All Rights Reserved.