How to use TensorFlow metrics in Keras

TensorFlow

Keras

Machine Learning

Deep Learning

Python

How to use TensorFlow metrics in Keras

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In Keras, metrics are the values you want to monitor during training and evaluation in addition to the loss. TensorFlow makes this straightforward through built-in metric names, metric classes, and custom metric implementations when the defaults are not enough.

Add Built-In Metrics in `compile`

The most common pattern is to pass metrics when compiling the model. Keras then reports them during fit, evaluate, and often in training history.

python

1import numpy as np
2import tensorflow as tf
3from tensorflow import keras
4from tensorflow.keras import layers
5
6x = np.random.randn(200, 4).astype("float32")
7y = np.random.randint(0, 2, size=(200, 1)).astype("float32")
8
9model = keras.Sequential([
10    layers.Input(shape=(4,)),
11    layers.Dense(16, activation="relu"),
12    layers.Dense(1, activation="sigmoid")
13])
14
15model.compile(
16    optimizer="adam",
17    loss="binary_crossentropy",
18    metrics=[
19        keras.metrics.BinaryAccuracy(name="accuracy"),
20        keras.metrics.Precision(name="precision"),
21        keras.metrics.Recall(name="recall")
22    ]
23)
24
25history = model.fit(x, y, epochs=3, validation_split=0.2, verbose=0)
26print(history.history.keys())

This setup is typical for binary classification. The loss drives optimization, while the metrics help you interpret model behavior.

Choose Metrics That Match the Task

Metrics should align with the prediction problem:

classification: accuracy, precision, recall, AUC
regression: MAE, MSE, RMSE
ranking or sequence tasks: more specialized metrics depending on the domain

It is common to start with one metric and then realize it hides important behavior. For example, accuracy can look good on imbalanced data even when recall is poor for the minority class.

Metric Names Versus Metric Objects

Keras accepts simple string names for many standard metrics.

python

1model.compile(
2    optimizer="adam",
3    loss="mse",
4    metrics=["mae"]
5)

That is fine for common cases. Metric objects are better when you want configuration or clear naming.

python

1model.compile(
2    optimizer="adam",
3    loss="mse",
4    metrics=[keras.metrics.MeanAbsoluteError(name="mae")]
5)

The object form is also easier to extend when thresholds or other parameters matter.

Example for Regression

python

1x = np.random.randn(100, 3).astype("float32")
2y = (2 * x[:, :1] - x[:, 1:2]).astype("float32")
3
4regression_model = keras.Sequential([
5    layers.Input(shape=(3,)),
6    layers.Dense(8, activation="relu"),
7    layers.Dense(1)
8])
9
10regression_model.compile(
11    optimizer="adam",
12    loss=keras.losses.MeanSquaredError(),
13    metrics=[keras.metrics.MeanAbsoluteError(), keras.metrics.RootMeanSquaredError()]
14)

This produces training logs with both MAE and RMSE, which are often easier to interpret than the loss alone.

Writing a Custom Metric

If the built-in metrics do not match your business requirement, subclass tf.keras.metrics.Metric.

python

1import tensorflow as tf
2
3class PositivePredictionRate(tf.keras.metrics.Metric):
4    def __init__(self, name="positive_rate", **kwargs):
5        super().__init__(name=name, **kwargs)
6        self.total = self.add_weight(name="total", initializer="zeros")
7        self.count = self.add_weight(name="count", initializer="zeros")
8
9    def update_state(self, y_true, y_pred, sample_weight=None):
10        predicted_positive = tf.cast(y_pred > 0.5, tf.float32)
11        self.count.assign_add(tf.reduce_sum(predicted_positive))
12        self.total.assign_add(tf.cast(tf.size(y_pred), tf.float32))
13
14    def result(self):
15        return tf.math.divide_no_nan(self.count, self.total)
16
17    def reset_state(self):
18        self.total.assign(0.0)
19        self.count.assign(0.0)

Custom metrics are helpful when you need reporting logic that is meaningful to the application rather than just to the optimizer.

Common Pitfalls

Expecting metrics to influence gradient updates is a conceptual mistake. The loss drives optimization; metrics are primarily for monitoring.
Choosing accuracy alone for imbalanced classification can hide poor minority-class performance. Add precision, recall, AUC, or task-specific metrics when appropriate.
Mixing metric names that do not match the output shape causes confusing logs or errors. Make sure the metric fits the model output and label format.
Writing a custom metric as a plain Python calculation can break TensorFlow graph execution or state management. Use tf.keras.metrics.Metric for stateful custom metrics.
Comparing models only by one visible metric can lead to bad decisions. Review loss, validation metrics, and domain-specific constraints together.

Summary

In Keras, metrics are monitoring signals reported during training and evaluation.
You can pass built-in metrics by string or as metric objects in compile.
Choose metrics that match the prediction task and class balance.
Custom metrics are possible by subclassing tf.keras.metrics.Metric.
Metrics help you understand model behavior, but the loss is what training actually optimizes.

How to use TensorFlow metrics in Keras

Master System Design with Codemia

Introduction

Add Built-In Metrics in compile

Choose Metrics That Match the Task

Metric Names Versus Metric Objects

Example for Regression

Writing a Custom Metric

Common Pitfalls

Summary

Add Built-In Metrics in `compile`