prediction accuracy
model.predict_classes
machine learning
model evaluation
deep learning

Is it possible to get prediction accuracy after call model.predict_classes?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Yes, but not from model.predict_classes() alone. Accuracy is not a property of predictions by themselves; it is a comparison between predicted labels and known true labels. So you always need the target values for the evaluation set, and in current Keras you should usually use model.predict() or model.evaluate() instead of the old predict_classes() helper.

Accuracy Needs Ground Truth

Suppose your model predicts classes for ten samples. Those predictions are only half of the information required. To compute accuracy, you must compare them against the real labels:

python
1import numpy as np
2from sklearn.metrics import accuracy_score
3
4y_true = np.array([0, 1, 1, 0, 1])
5y_pred = np.array([0, 1, 0, 0, 1])
6
7print(accuracy_score(y_true, y_pred))

Without y_true, there is no accuracy value to compute. You can inspect predictions, but you cannot score them.

Why predict_classes() Is the Wrong Focus Now

Older Keras versions exposed model.predict_classes() as a convenience method. Modern Keras has standardized around model.predict(), model.evaluate(), and explicit post-processing of prediction outputs.

That is a better design because different models produce different kinds of outputs:

  • Binary classifiers often produce probabilities from a sigmoid unit
  • Multiclass classifiers often produce a softmax vector
  • Regression models produce numeric values, not class labels

predict_classes() hid those differences behind a convenience API. In real evaluation code, you usually want to control how predictions are converted to final labels.

The Simplest Path: model.evaluate()

If your model was compiled with an accuracy metric, model.evaluate() is usually the easiest way to get loss and accuracy on a labeled dataset.

python
1import numpy as np
2import tensorflow as tf
3
4model = tf.keras.Sequential([
5    tf.keras.layers.Input(shape=(4,)),
6    tf.keras.layers.Dense(8, activation="relu"),
7    tf.keras.layers.Dense(1, activation="sigmoid"),
8])
9
10model.compile(
11    optimizer="adam",
12    loss="binary_crossentropy",
13    metrics=["accuracy"],
14)
15
16x = np.random.rand(100, 4).astype("float32")
17y = (x.sum(axis=1) > 2.0).astype("float32")
18
19model.fit(x, y, epochs=3, verbose=0)
20loss, accuracy = model.evaluate(x, y, verbose=0)
21
22print("accuracy:", accuracy)

This is the direct answer when you already have x_test and y_test.

Manual Accuracy from model.predict()

If you want more control, use model.predict() and convert the output into labels yourself.

For binary classification:

python
1import numpy as np
2from sklearn.metrics import accuracy_score
3
4probs = model.predict(x, verbose=0).reshape(-1)
5preds = (probs >= 0.5).astype("int32")
6
7print(accuracy_score(y.astype("int32"), preds))

This is conceptually what many older predict_classes() examples were doing behind the scenes.

Multiclass Example

For multiclass classification, the conversion step is usually argmax.

python
1import numpy as np
2from sklearn.metrics import accuracy_score
3
4y_true = np.array([2, 0, 1])
5y_prob = np.array([
6    [0.1, 0.2, 0.7],
7    [0.8, 0.1, 0.1],
8    [0.2, 0.6, 0.2],
9])
10
11y_pred = np.argmax(y_prob, axis=1)
12print(accuracy_score(y_true, y_pred))

The key point is that the prediction format depends on the model output layer, but the final accuracy calculation is still "predicted labels compared with true labels."

Why Threshold Choice Matters

For binary classifiers, turning probabilities into class labels requires a threshold. Many examples use 0.5, but that is not always the right choice.

python
threshold = 0.7
preds = (probs >= threshold).astype("int32")

Fraud detection, anomaly detection, medical screening, and other cost-sensitive problems often need a threshold chosen from validation data rather than a hard-coded default.

That is another reason predict_classes() was limited: it hid a modeling decision that is often important.

Accuracy Is Not Always Enough

Even when you compute accuracy correctly, it may still be misleading. If one class dominates the dataset, a weak model can achieve high accuracy by predicting the majority class most of the time.

For a more realistic evaluation, consider additional metrics:

python
from sklearn.metrics import classification_report

print(classification_report(y.astype("int32"), preds))

Precision, recall, F1, confusion matrices, and ROC-related metrics often tell a more useful story than accuracy alone.

Common Pitfalls

One common mistake is expecting predict_classes() to return an accuracy number. It never had enough information to do that without true labels.

Another issue is using argmax on binary sigmoid output. For binary probability vectors shaped as one column, thresholding is the correct operation, not argmax.

Developers also sometimes compare prediction arrays and label arrays with mismatched shapes, such as (n, 1) versus (n,). Flatten or reshape them consistently before scoring.

Finally, do not assume that the metric called "accuracy" in model.evaluate() is always the one you want. It depends on how the model was compiled and may not reflect business priorities.

Summary

  • Accuracy requires both predictions and true labels.
  • 'model.evaluate() is the easiest way to get accuracy if the model was compiled with that metric.'
  • In modern Keras, use model.predict() and convert outputs to labels yourself instead of relying on predict_classes().
  • Use thresholding for binary outputs and argmax for multiclass outputs.
  • Consider metrics beyond accuracy when class imbalance or cost asymmetry matters.

Course illustration
Course illustration

All Rights Reserved.