TensorBoard
TensorFlow 2
machine learning
deep learning
data visualization

How to write to TensorBoard in TensorFlow 2

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

TensorBoard is TensorFlow's built-in visualization toolkit that lets you monitor and debug your machine learning experiments. It provides dashboards for tracking metrics like loss and accuracy, visualizing model architectures, inspecting weight distributions, and displaying images or text during training. In TensorFlow 2, writing data to TensorBoard is straightforward thanks to the Keras callback API and the tf.summary module. This guide shows you how to use both approaches effectively.

Setting Up TensorBoard

TensorBoard is included when you install TensorFlow. To launch it, you point it at a log directory where your training summaries are stored:

bash
pip install tensorboard  # Usually installed with TensorFlow
tensorboard --logdir=./logs

This starts a local web server, typically at http://localhost:6006, where you can view your dashboards in a browser.

Using the Keras TensorBoard Callback

The simplest way to write training metrics to TensorBoard is through the built-in Keras callback. This requires no manual summary writing -- just pass the callback to model.fit:

python
1import tensorflow as tf
2import datetime
3
4# Create a log directory with a timestamp
5log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
6tensorboard_callback = tf.keras.callbacks.TensorBoard(
7    log_dir=log_dir,
8    histogram_freq=1,      # Log weight histograms every epoch
9    write_graph=True,      # Log the model graph
10    write_images=False,
11    update_freq='epoch'
12)
13
14model = tf.keras.Sequential([
15    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
16    tf.keras.layers.Dropout(0.2),
17    tf.keras.layers.Dense(10, activation='softmax')
18])
19
20model.compile(
21    optimizer='adam',
22    loss='sparse_categorical_crossentropy',
23    metrics=['accuracy']
24)
25
26model.fit(
27    x_train, y_train,
28    epochs=10,
29    validation_data=(x_val, y_val),
30    callbacks=[tensorboard_callback]
31)

Setting histogram_freq=1 tells TensorBoard to record weight and bias histograms every epoch, which is useful for spotting issues like vanishing or exploding gradients. The timestamp in the log directory lets you compare multiple training runs side by side.

Writing Custom Scalars with tf.summary

When you need to log custom metrics that are not part of the standard Keras training loop, use the tf.summary API directly. Create a SummaryWriter and use it as a context manager:

python
1import tensorflow as tf
2
3# Create a summary writer
4writer = tf.summary.create_file_writer("logs/custom/")
5
6for step in range(1000):
7    loss = train_step()  # Your custom training logic
8    learning_rate = optimizer.learning_rate.numpy()
9
10    with writer.as_default():
11        tf.summary.scalar("custom/loss", loss, step=step)
12        tf.summary.scalar("custom/learning_rate", learning_rate, step=step)
13
14writer.close()

Every call to tf.summary.scalar records a single data point at the given step. TensorBoard then plots these points as line charts, letting you see how your metrics evolve over time.

Logging Images and Text

TensorBoard can display more than just numbers. You can log images to monitor what your model sees during training, and log text for debugging:

python
1import numpy as np
2
3writer = tf.summary.create_file_writer("logs/media/")
4
5with writer.as_default():
6    # Log a batch of images
7    images = tf.reshape(x_train[:25], (-1, 28, 28, 1))
8    tf.summary.image("Training Samples", images, max_outputs=25, step=0)
9
10    # Log text
11    tf.summary.text("Experiment Config",
12                    "batch_size=64, lr=0.001, dropout=0.2",
13                    step=0)

Image logging is particularly valuable for tasks like image generation, segmentation, or data augmentation verification, where you want to visually confirm what your model is producing at each stage.

Logging Histograms and Distributions

Histograms help you understand the distribution of weights, biases, and activations across your network. You can log them manually:

python
1writer = tf.summary.create_file_writer("logs/histograms/")
2
3with writer.as_default():
4    for epoch in range(10):
5        # After each training epoch, log weight distributions
6        for layer in model.layers:
7            if layer.weights:
8                tf.summary.histogram(
9                    f"{layer.name}/weights",
10                    layer.weights[0],
11                    step=epoch
12                )
13                if len(layer.weights) > 1:
14                    tf.summary.histogram(
15                        f"{layer.name}/biases",
16                        layer.weights[1],
17                        step=epoch
18                    )

If you see weight distributions collapsing to near-zero or exploding to large values, that signals a problem with your learning rate or initialization strategy.

Comparing Multiple Runs

A key benefit of TensorBoard is comparing experiments. By writing logs to separate subdirectories, each run appears as a separate line on the dashboard:

python
1for lr in [0.001, 0.01, 0.1]:
2    log_dir = f"logs/lr_experiment/lr_{lr}"
3    callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
4    model = build_model()
5    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
6                  loss='sparse_categorical_crossentropy',
7                  metrics=['accuracy'])
8    model.fit(x_train, y_train, epochs=10,
9              validation_data=(x_val, y_val),
10              callbacks=[callback])

Then launch TensorBoard with --logdir=logs/lr_experiment to see all three learning rates plotted together.

Common Pitfalls

  • Writing all runs to the same directory: If multiple training runs write to the same log directory without unique subdirectories, their metrics overlap on the same chart and produce confusing, jumbled plots. Always include a timestamp or experiment name in the log path.
  • Forgetting to set the step parameter: Every tf.summary call requires a step argument. Omitting it defaults to a global step counter that can produce unexpected x-axis values in your charts. Always pass an explicit step value for clarity.
  • Logging too frequently: Writing summaries every training step (rather than every N steps or every epoch) generates massive log directories and slows down both training and the TensorBoard UI. Log scalars every 100 steps and histograms or images every epoch as a reasonable starting point.
  • Not flushing the writer: Summary data is buffered in memory and may not be written to disk immediately. If your training script crashes or you check TensorBoard mid-training, recent data might be missing. Call writer.flush() periodically or use the writer as a context manager to ensure data is persisted.
  • Mixing TensorFlow 1.x summary API with TensorFlow 2: The old tf.compat.v1.summary API and session-based FileWriter still work in TF2 but should not be mixed with the new tf.summary API. Mixing them leads to duplicate or missing logs. Use the tf.summary API exclusively in TensorFlow 2 projects.

Summary

  • Use the tf.keras.callbacks.TensorBoard callback for automatic logging of loss, metrics, weight histograms, and model graphs during model.fit.
  • Use tf.summary.create_file_writer and tf.summary.scalar, tf.summary.image, tf.summary.histogram, and tf.summary.text for custom logging outside the Keras training loop.
  • Organize logs into separate subdirectories per experiment to enable side-by-side comparison of training runs in the TensorBoard UI.
  • Launch TensorBoard with tensorboard --logdir=./logs and open http://localhost:6006 to view your dashboards.
  • Flush summary writers regularly and avoid logging at every single step to keep log sizes manageable and the TensorBoard interface responsive.

Course illustration
Course illustration

All Rights Reserved.