How to add Tensorboard to a Tensorflow estimator process

TensorBoard

TensorFlow

Estimator

Machine Learning

Tutorial

How to add Tensorboard to a Tensorflow estimator process

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

TensorBoard is TensorFlow's built-in visualization toolkit for inspecting training metrics, model graphs, histograms, and more. When you use the TensorFlow Estimator API, most of the TensorBoard integration comes for free because Estimators automatically write event files to their model_dir. This article shows you how to take advantage of that automatic logging, how to configure it with RunConfig, how to launch TensorBoard, and how to add your own custom summaries inside the model_fn.

Automatic Logging via model_dir

Every TensorFlow Estimator requires a model_dir parameter. During training the Estimator writes checkpoints, a saved model graph, and scalar summaries (loss at a minimum) into that directory. TensorBoard reads these event files directly.

python

1import tensorflow as tf
2
3# The Estimator writes logs to ./my_model automatically
4estimator = tf.estimator.DNNClassifier(
5    feature_columns=[tf.feature_column.numeric_column("x", shape=[4])],
6    hidden_units=[128, 64],
7    n_classes=3,
8    model_dir="./my_model",
9)
10
11# Training — event files appear in ./my_model
12estimator.train(input_fn=train_input_fn, steps=1000)

After training starts, the ./my_model directory will contain files with names like events.out.tfevents.*. TensorBoard reads these to display your training loss curve.

Configuring Logging with tf.estimator.RunConfig

By default the Estimator saves summaries every 100 steps and checkpoints every 600 seconds. You can tune these intervals with RunConfig.

python

1import tensorflow as tf
2
3run_config = tf.estimator.RunConfig(
4    model_dir="./my_model",
5    save_summary_steps=50,       # write summaries every 50 steps
6    save_checkpoints_steps=500,  # checkpoint every 500 steps
7    log_step_count_steps=100,    # log global_step/sec every 100 steps
8)
9
10estimator = tf.estimator.DNNClassifier(
11    feature_columns=[tf.feature_column.numeric_column("x", shape=[4])],
12    hidden_units=[128, 64],
13    n_classes=3,
14    config=run_config,
15)

Lower save_summary_steps values give you finer-grained curves in TensorBoard but increase disk I/O. A value between 50 and 200 is a reasonable starting point for most training jobs.

Launching TensorBoard

Once your Estimator has begun writing events, start TensorBoard from the command line and point it at the model directory.

bash

1# Install if you have not already
2pip install tensorboard
3
4# Launch and point to the model directory
5tensorboard --logdir=./my_model --port=6006

Open http://localhost:6006 in your browser. The Scalars tab shows loss and any other summaries over time, the Graphs tab displays the model architecture, and the Distributions and Histograms tabs show weight and activation distributions.

You can also compare multiple runs by pointing --logdir at a parent directory that contains several model directories.

bash

1tensorboard --logdir=./experiments
2# experiments/
3#   run_lr_0.01/
4#   run_lr_0.001/

TensorBoard will display each run as a separate colored line, making it easy to compare hyperparameter choices.

Adding Custom Summaries in model_fn

Pre-built Estimators like DNNClassifier log loss automatically, but if you write a custom Estimator you can add any summary you need inside your model_fn.

python

1import tensorflow as tf
2
3def model_fn(features, labels, mode):
4    # Build the network
5    net = tf.keras.layers.Dense(128, activation="relu")(features["x"])
6    logits = tf.keras.layers.Dense(3)(net)
7
8    predictions = tf.argmax(logits, axis=1)
9
10    if mode == tf.estimator.ModeKeys.PREDICT:
11        return tf.estimator.EstimatorSpec(mode, predictions=predictions)
12
13    loss = tf.keras.losses.sparse_categorical_crossentropy(
14        labels, logits, from_logits=True
15    )
16    loss = tf.reduce_mean(loss)
17
18    # --- Custom TensorBoard summaries ---
19    accuracy = tf.reduce_mean(
20        tf.cast(tf.equal(predictions, labels), tf.float32)
21    )
22    tf.summary.scalar("accuracy", accuracy)
23    tf.summary.histogram("logits", logits)
24
25    if mode == tf.estimator.ModeKeys.EVAL:
26        eval_metric_ops = {
27            "accuracy": tf.compat.v1.metrics.accuracy(labels, predictions),
28        }
29        return tf.estimator.EstimatorSpec(
30            mode, loss=loss, eval_metric_ops=eval_metric_ops
31        )
32
33    optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)
34    train_op = optimizer.minimize(
35        loss, global_step=tf.compat.v1.train.get_global_step()
36    )
37    return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
38
39estimator = tf.estimator.Estimator(
40    model_fn=model_fn,
41    model_dir="./my_model",
42)

tf.summary.scalar and tf.summary.histogram calls inside model_fn are automatically picked up by the Estimator's training loop and written to the event files. You will see them as new tabs and plots in TensorBoard.

Using TensorBoard During Evaluation

When you call estimator.evaluate(), TensorFlow writes evaluation summaries into a subdirectory called eval inside model_dir. TensorBoard displays these on a separate run so you can compare training and evaluation metrics side by side.

python

eval_result = estimator.evaluate(input_fn=eval_input_fn, steps=100)
print(eval_result)
# {'accuracy': 0.91, 'loss': 0.34, 'global_step': 1000}

The evaluation results appear in TensorBoard under the eval tag, giving you a clear picture of whether your model is overfitting.

Common Pitfalls

Pointing TensorBoard at the wrong directory: --logdir must point to the exact directory (or its parent) that contains events.out.tfevents.* files. A wrong path results in an empty dashboard with no errors.
Setting save_summary_steps too low: Writing summaries every step creates large event files and slows training noticeably. Keep it at 50 or above unless you are debugging a specific issue.
Forgetting to install a compatible TensorBoard version: TensorBoard and TensorFlow versions must be compatible. Mismatched versions can produce import errors or missing UI features. Use pip install tensorboard==\<same major.minor as TF>.
Not calling tf.summary inside model_fn: Summary ops defined outside the model_fn are not captured by the Estimator's training loop. Always place your tf.summary.scalar and tf.summary.histogram calls inside the function.
Ignoring the eval subdirectory: Evaluation metrics are written to model_dir/eval, not the root of model_dir. If you only look at the root run in TensorBoard, you will miss evaluation results.

Summary

TensorFlow Estimators automatically log loss and checkpoints to model_dir, making basic TensorBoard integration zero-configuration.
Use tf.estimator.RunConfig to control how often summaries and checkpoints are saved.
Launch TensorBoard with tensorboard --logdir=./my_model and open http://localhost:6006 to view training curves, graphs, and histograms.
Add custom metrics inside your model_fn with tf.summary.scalar and tf.summary.histogram for visibility beyond the default loss.
Compare multiple runs by pointing --logdir at a parent directory that contains several model directories.