tensorflow
programming
debugging
machine learning
graph

About tensorflow graph what am I wrong with this program?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

TensorFlow graph bugs often look mysterious because errors appear far from the real mistake. Most failures come from execution-mode confusion, shape mismatches, or stateful operations used incorrectly inside traced functions. A practical debugging strategy is to shrink the program, validate tensor contracts, and reintroduce complexity step by step.

Start by Confirming Execution Model

TensorFlow 2 uses eager execution by default, but tf.function switches logic into graph tracing. Code that works eagerly may fail once traced.

python
1import tensorflow as tf
2
3print("TensorFlow version:", tf.__version__)
4print("Eager enabled:", tf.executing_eagerly())
5
6@tf.function
7def add_one(x):
8    return x + 1
9
10print(add_one(tf.constant([1, 2, 3], dtype=tf.int32)))

If behavior changes only after adding tf.function, inspect Python-side control flow and shape assumptions first.

Validate Shape and Dtype Boundaries

Most graph errors are contract errors. Add explicit checks at function boundaries.

python
1import tensorflow as tf
2
3@tf.function
4def model_step(x):
5    tf.debugging.assert_rank(x, 2)
6    tf.debugging.assert_type(x, tf.float32)
7
8    w = tf.ones((tf.shape(x)[1], 1), dtype=tf.float32)
9    y = tf.matmul(x, w)
10    tf.debugging.assert_shapes([(y, (None, 1))])
11    return y
12
13batch = tf.ones((4, 3), dtype=tf.float32)
14print(model_step(batch))

Failing fast with assertions is easier than debugging a downstream optimizer failure.

Separate Dataset Issues from Model Issues

A frequent mistake is debugging the full training loop when input pipeline is already broken. Isolate components:

  1. Run model with synthetic constants.
  2. Run dataset pipeline and inspect one batch.
  3. Combine once both pass.
python
1import tensorflow as tf
2
3# pipeline check
4raw = tf.random.uniform((10, 3), dtype=tf.float32)
5ds = tf.data.Dataset.from_tensor_slices(raw).batch(4)
6for b in ds.take(1):
7    print("batch shape", b.shape, "dtype", b.dtype)

This quickly reveals whether the issue is preprocessing or model math.

Use tf.print Inside Traced Functions

Normal Python print can run only during tracing, not each execution step. Use tf.print for runtime values in graph execution.

python
1@tf.function
2def debug_step(x):
3    tf.print("x shape:", tf.shape(x), "x mean:", tf.reduce_mean(x))
4    return x * 2
5
6debug_step(tf.ones((2, 2)))

tf.print output is much more reliable for graph debugging.

Avoid Hidden State in tf.function

Creating variables repeatedly inside a traced function is a common source of errors. Variables should typically be created once in layer or module constructors.

Bad pattern:

python
1@tf.function
2def bad(x):
3    v = tf.Variable(1.0)
4    return x + v

Better pattern is using tf.Module or Keras layers where variables are managed predictably.

Build a Minimal Reproducer

When debugging a large program, reduce it to a small script with fixed input and no external dependencies. Keep random seeds fixed for deterministic behavior.

python
1import tensorflow as tf
2
3tf.random.set_seed(7)
4
5@tf.function
6def f(x):
7    return tf.reduce_sum(tf.square(x))
8
9x = tf.constant([1.0, 2.0, 3.0])
10print(f(x))

Then add one component at a time until failure returns. This method finds root causes faster than inspecting a full training stack.

Migration Notes for Legacy TF1 Code

If code still uses session or placeholder patterns, migrate incrementally:

  • Move input tensors to eager-friendly APIs.
  • Replace session runs with callable functions.
  • Add parity checks on known inputs.

Do not mix old and new execution assumptions in the same module without tests.

Common Pitfalls

  • Assuming eager and graph mode produce identical debug behavior.
  • Ignoring shape and dtype assertions at function boundaries.
  • Debugging full training loops before verifying dataset batches independently.
  • Using Python print for values that execute inside traced graphs.
  • Creating mutable state repeatedly inside tf.function calls.

Summary

  • Confirm execution mode first when graph behavior looks inconsistent.
  • Add shape and dtype assertions to catch contract errors early.
  • Isolate model code from input pipeline code during debugging.
  • Use tf.print and minimal reproducers for reliable diagnostics.
  • Migrate legacy TF1 patterns carefully with parity checks.

Course illustration
Course illustration

All Rights Reserved.