Difference between keras.metrics.Accuracy and accuracy

keras

metrics

Accuracy

machine learning

model evaluation

Difference between keras.metrics.Accuracy and accuracy

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

When compiling a Keras model, you specify metrics to track during training and evaluation. Two common options are the string "accuracy" and the class keras.metrics.Accuracy(). They look like they do the same thing, but they behave differently. The string shorthand triggers Keras's smart metric selection, while the class gives you an explicit, specific accuracy computation. Confusing the two can lead to misleading evaluation numbers.

This article explains exactly what each one does, when Keras uses each variant, and how to choose the right one for your model.

What the String "accuracy" Does

When you pass the string "accuracy" to model.compile(), Keras does not simply use one fixed accuracy function. Instead, it inspects the loss function and output shape of your model to decide which accuracy metric to apply.

python

1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Dense(64, activation="relu", input_shape=(784,)),
5    tf.keras.layers.Dense(10, activation="softmax")
6])
7
8model.compile(
9    optimizer="adam",
10    loss="categorical_crossentropy",
11    metrics=["accuracy"]
12)

In this example, because the loss is categorical_crossentropy and the output shape is (batch, 10), Keras automatically selects CategoricalAccuracy as the underlying metric. Here is the selection logic:

categorical_crossentropy loss: Keras uses CategoricalAccuracy, which compares the index of the highest predicted probability against the one-hot encoded label.
sparse_categorical_crossentropy loss: Keras uses SparseCategoricalAccuracy, which compares the predicted class index against an integer label.
binary_crossentropy loss: Keras uses BinaryAccuracy, which applies a threshold (default 0.5) to the predictions and compares against binary labels.

This auto-selection is convenient because it handles the label format for you.

What keras.metrics.Accuracy() Does

keras.metrics.Accuracy() is a specific metric class that computes strict equality between predictions and labels. It does not apply argmax to the predictions or check probability thresholds. It literally compares the predicted value to the true value element by element.

python

1import tensorflow as tf
2
3m = tf.keras.metrics.Accuracy()
4m.update_state(
5    y_true=[1, 2, 3, 4],
6    y_pred=[1, 2, 0, 4]
7)
8print(m.result().numpy())  # 0.75 (3 out of 4 match)

This works well when your predictions are already discrete class labels (integers), not probability distributions. If you pass softmax probabilities to keras.metrics.Accuracy(), it will compare the probability vectors directly to one-hot labels, which will almost never match, resulting in a reported accuracy near zero.

The Difference in Practice

Here is a concrete example showing how the two diverge.

python

1import tensorflow as tf
2import numpy as np
3
4# Model with softmax output
5model = tf.keras.Sequential([
6    tf.keras.layers.Dense(32, activation="relu", input_shape=(20,)),
7    tf.keras.layers.Dense(5, activation="softmax")
8])
9
10# Using the string "accuracy" - works correctly
11model.compile(
12    optimizer="adam",
13    loss="categorical_crossentropy",
14    metrics=["accuracy"]
15)
16
17# Generate dummy data
18x = np.random.random((100, 20))
19y = tf.keras.utils.to_categorical(np.random.randint(5, size=100), num_classes=5)
20
21history = model.fit(x, y, epochs=3, verbose=0)
22print(f"String accuracy: {history.history['accuracy'][-1]:.4f}")
23
24# Using keras.metrics.Accuracy() - will report near-zero accuracy
25model.compile(
26    optimizer="adam",
27    loss="categorical_crossentropy",
28    metrics=[tf.keras.metrics.Accuracy()]
29)
30
31history2 = model.fit(x, y, epochs=3, verbose=0)
32print(f"Class accuracy: {history2.history['accuracy'][-1]:.4f}")

The string version reports a reasonable accuracy (depending on the data and training), while the class version reports a number near zero because it compares raw probability vectors to one-hot vectors.

Choosing the Right Metric Class Explicitly

If you want to be explicit rather than relying on string auto-selection, use the specific accuracy class that matches your label format.

python

1# For one-hot encoded labels with softmax output
2model.compile(
3    optimizer="adam",
4    loss="categorical_crossentropy",
5    metrics=[tf.keras.metrics.CategoricalAccuracy()]
6)
7
8# For integer labels with softmax output
9model.compile(
10    optimizer="adam",
11    loss="sparse_categorical_crossentropy",
12    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
13)
14
15# For binary classification with sigmoid output
16model.compile(
17    optimizer="adam",
18    loss="binary_crossentropy",
19    metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.5)]
20)

Using the explicit class is clearer than the string shorthand and eliminates any ambiguity about which computation is being performed.

Custom Training Loops

In custom training loops (without model.compile), you must use metric objects directly and manage their state yourself.

python

1accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy()
2
3for epoch in range(num_epochs):
4    accuracy_metric.reset_state()
5    for x_batch, y_batch in dataset:
6        with tf.GradientTape() as tape:
7            predictions = model(x_batch, training=True)
8            loss = loss_fn(y_batch, predictions)
9        gradients = tape.gradient(loss, model.trainable_variables)
10        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
11        accuracy_metric.update_state(y_batch, predictions)
12
13    print(f"Epoch {epoch}: accuracy = {accuracy_metric.result().numpy():.4f}")

In this context, the string shorthand does not apply. You are responsible for picking the correct metric class.

Common Pitfalls

Using keras.metrics.Accuracy() with softmax outputs. This is the most common mistake. The metric expects discrete predictions, not probability distributions. Use CategoricalAccuracy or SparseCategoricalAccuracy instead.
Assuming "accuracy" always means the same thing. The string changes meaning depending on your loss function. If you switch from categorical_crossentropy to sparse_categorical_crossentropy but keep "accuracy", the underlying metric changes silently. This can be confusing during debugging.
Forgetting to reset state in custom loops. Metric objects accumulate state across batches. If you forget reset_state() at the start of each epoch, your accuracy numbers will average across all epochs rather than reflecting the current one.
Mixing label formats. If your labels are one-hot but you use SparseCategoricalAccuracy, or your labels are integers but you use CategoricalAccuracy, the accuracy numbers will be incorrect. Always match the metric to the label encoding.

Summary

The string "accuracy" in model.compile() is a shorthand that auto-selects the appropriate accuracy variant based on your loss function and output shape. keras.metrics.Accuracy() is a specific class that does strict equality comparison and is only appropriate when your predictions are already discrete values. For most classification tasks, either use the string shorthand or be explicit with CategoricalAccuracy, SparseCategoricalAccuracy, or BinaryAccuracy to avoid silent mismatches between your metric and your data format.