machine learning
custom callback
epoch logging
training metrics
model training

Custom callback after each epoch to log certain information

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When you need information after every training epoch, a custom callback is the normal extension point. In Keras, subclassing Callback lets you inspect the logs dictionary, print metrics, write to a file, or trigger your own reporting logic at the end of each epoch.

Use on_epoch_end

The most common hook is on_epoch_end, because Keras calls it after each epoch and passes a dictionary of metrics gathered during that epoch.

python
1import tensorflow as tf
2
3
4class EpochLogger(tf.keras.callbacks.Callback):
5    def on_epoch_end(self, epoch, logs=None):
6        logs = logs or {}
7        loss = logs.get("loss")
8        accuracy = logs.get("accuracy")
9        val_loss = logs.get("val_loss")
10        val_accuracy = logs.get("val_accuracy")
11
12        print(
13            f"epoch={epoch + 1} "
14            f"loss={loss:.4f} "
15            f"accuracy={accuracy:.4f} "
16            f"val_loss={val_loss:.4f} "
17            f"val_accuracy={val_accuracy:.4f}"
18        )

Then pass the callback into fit:

python
1model.fit(
2    x_train,
3    y_train,
4    validation_data=(x_val, y_val),
5    epochs=5,
6    callbacks=[EpochLogger()],
7)

That is the standard pattern for custom epoch-level logging.

What Lives Inside logs

The logs dictionary contains metrics that were actually computed during training. Typical keys include:

  • 'loss'
  • 'accuracy'
  • 'val_loss'
  • 'val_accuracy'

If you did not ask Keras to compute a metric, it will not appear there. That means your callback should use logs.get(...) rather than assuming every key always exists.

Logging to a File Instead of the Console

Printing is useful during experimentation, but persistent logs are often better for longer runs.

python
1import json
2import tensorflow as tf
3
4
5class JsonEpochLogger(tf.keras.callbacks.Callback):
6    def __init__(self, path):
7        super().__init__()
8        self.path = path
9
10    def on_epoch_end(self, epoch, logs=None):
11        payload = {"epoch": epoch + 1, **(logs or {})}
12        with open(self.path, "a", encoding="utf-8") as file:
13            file.write(json.dumps(payload) + "\n")

This writes one JSON line per epoch, which is easy to parse later for dashboards or debugging.

You Can Add Custom Computation Too

Sometimes the built-in metrics are not enough. A callback can run extra logic after each epoch, such as evaluating a small reference batch or computing a custom statistic from the model weights.

For example, this callback prints the learning rate:

python
1class LearningRateLogger(tf.keras.callbacks.Callback):
2    def on_epoch_end(self, epoch, logs=None):
3        optimizer = self.model.optimizer
4        lr = tf.keras.backend.get_value(optimizer.learning_rate)
5        print(f"epoch {epoch + 1}: learning_rate={lr}")

That is useful when a scheduler changes the optimizer settings during training.

Keep Callback Work Lightweight

Callbacks run in the training loop, so they should stay relatively cheap. Logging metrics is fine. Running a huge external evaluation after every epoch may slow training dramatically and distort your sense of performance.

A good callback is informative without becoming the bottleneck.

Common Pitfalls

The biggest pitfall is assuming logs always contains a specific key. If a metric was not configured, that key may be missing.

Another common mistake is putting expensive work in on_epoch_end without realizing it runs every epoch. Heavy callbacks can make training appear much slower than the model alone would be.

People also forget that the callback instance already has access to self.model. That object can be inspected directly, so there is no need to pass the model around manually.

Summary

  • Subclass tf.keras.callbacks.Callback and override on_epoch_end to run logic after each epoch.
  • Read training metrics from the logs dictionary.
  • Use logs.get(...) so the callback handles missing metrics safely.
  • Write to the console, a file, or another system depending on how persistent the logging should be.
  • Keep callback work lightweight so it does not become the slowest part of training.
  • Combine custom logging with built-in callbacks when you also need checkpointing or early stopping.
  • The callback can also inspect training state directly when metric values alone are not enough.

Course illustration
Course illustration

All Rights Reserved.