TensorFlow
TensorBoard
machine learning
deep learning
Python logging

How to write to TensorBoard in TensorFlow 2

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

TensorBoard is the primary observability tool for TensorFlow 2 training jobs. It helps you inspect loss curves, compare experiments, and diagnose model behavior with visual artifacts like images and histograms. The key is to design logging intentionally so every run is easy to reproduce and compare.

Logging with the Keras TensorBoard Callback

If you use model.fit, the callback path is the fastest way to get useful dashboards. You create a run-specific log directory, attach the callback, and then launch TensorBoard against the parent directory.

python
1from datetime import datetime
2import tensorflow as tf
3
4(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
5x_train = (x_train.astype("float32") / 255.0)[..., None]
6x_test = (x_test.astype("float32") / 255.0)[..., None]
7
8model = tf.keras.Sequential([
9    tf.keras.layers.Input(shape=(28, 28, 1)),
10    tf.keras.layers.Conv2D(32, 3, activation="relu"),
11    tf.keras.layers.MaxPool2D(),
12    tf.keras.layers.Flatten(),
13    tf.keras.layers.Dense(10, activation="softmax")
14])
15
16model.compile(
17    optimizer=tf.keras.optimizers.Adam(1e-3),
18    loss="sparse_categorical_crossentropy",
19    metrics=["accuracy"]
20)
21
22run_id = datetime.now().strftime("%Y%m%d-%H%M%S")
23log_dir = f"logs/fit/{run_id}"
24
25tensorboard_cb = tf.keras.callbacks.TensorBoard(
26    log_dir=log_dir,
27    histogram_freq=1,
28    update_freq="epoch",
29    write_graph=True
30)
31
32model.fit(
33    x_train,
34    y_train,
35    validation_data=(x_test, y_test),
36    epochs=3,
37    batch_size=128,
38    callbacks=[tensorboard_cb]
39)

Run TensorBoard in another shell:

bash
tensorboard --logdir logs/fit --port 6006

Using one folder per run avoids mixed charts and makes experiment comparison reliable.

Writing Custom Summaries with tf.summary

For custom loops, reinforcement learning, or research code, write scalars manually with tf.summary.create_file_writer. The important part is consistent step values.

python
1from datetime import datetime
2import tensorflow as tf
3
4writer = tf.summary.create_file_writer(
5    f"logs/custom/{datetime.now().strftime('%Y%m%d-%H%M%S')}"
6)
7
8for step in range(1, 101):
9    # Example values from a synthetic loop
10    loss = 1.0 / step
11    learning_rate = 1e-3 * (0.99 ** step)
12
13    with writer.as_default():
14        tf.summary.scalar("train/loss", loss, step=step)
15        tf.summary.scalar("train/learning_rate", learning_rate, step=step)
16
17writer.flush()

A monotonic step is non-negotiable. If step numbers repeat or go backward, plots look broken and comparisons become misleading.

Logging Images and Histograms for Deeper Debugging

Scalar curves are not enough when you need to debug inputs, feature activations, or parameter drift. TensorBoard supports image and histogram summaries that expose issues quickly.

python
1import numpy as np
2import tensorflow as tf
3
4writer = tf.summary.create_file_writer("logs/custom/media")
5
6images = np.random.rand(8, 28, 28, 1).astype("float32")
7weights = np.random.normal(size=(1000,)).astype("float32")
8
9with writer.as_default():
10    tf.summary.image("debug/sample_images", images, step=0, max_outputs=8)
11    tf.summary.histogram("debug/weight_distribution", weights, step=0)
12
13writer.flush()

Use this sparingly. Logging large tensors every step can slow training and generate very large event files.

Organizing Multi-Experiment Runs

Treat TensorBoard logs as experiment artifacts. A useful directory convention includes model variant and critical hyperparameters in the folder name.

text
1logs/
2  fit/
3    cnn_lr1e-3_bs128_run01/
4    cnn_lr1e-4_bs128_run01/
5    resnet_lr3e-4_bs64_run01/

You can then run one TensorBoard process and filter experiments in the UI. This eliminates guesswork when you revisit results weeks later.

A practical addition is to save metadata as text summaries at step zero.

python
1import tensorflow as tf
2
3hparams = {
4    "model": "cnn",
5    "lr": "1e-3",
6    "batch_size": "128"
7}
8
9writer = tf.summary.create_file_writer("logs/fit/cnn_lr1e-3_bs128_run01")
10with writer.as_default():
11    for key, value in hparams.items():
12        tf.summary.text(f"hparams/{key}", value, step=0)
13writer.flush()

This is simple but valuable when you need to audit why one run outperformed another.

Common Pitfalls

The most common issue is reusing the same log_dir across unrelated runs. That merges charts and makes trends hard to trust. Always generate a unique run directory.

Another frequent problem is missing data because scripts exit before events are written. Call writer.flush() in short jobs, especially in notebooks and one-off scripts.

Some teams log too much media. Writing full-resolution images each batch can dominate IO and distort training throughput. Log a small sample less frequently.

Step mismatches are also easy to miss. If training and validation scalars use inconsistent step semantics, lines will not align. Use a single global step strategy.

Finally, remember that old logs can consume disk quickly. Clean stale runs after review and archive only the runs that matter.

Summary

  • Use tf.keras.callbacks.TensorBoard for model.fit workflows.
  • Use tf.summary writers for custom loops and research metrics.
  • Keep one unique log directory per run for clean comparisons.
  • Log images and histograms selectively to avoid IO bottlenecks.
  • Enforce monotonic step values and flush writers before exit.
  • Store lightweight run metadata to make experiment tracking reproducible.

Course illustration
Course illustration