Keras
classification
machine learning
probabilities
prediction

Returning probabilities in a classification prediction in Keras?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In Keras, model.predict() does not magically decide whether to return probabilities. The output depends on the model’s final layer: if the model ends with sigmoid or softmax, the predictions are already probability-like values; if it outputs logits, you must convert them yourself before interpreting them as probabilities.

Binary Classification with sigmoid

For binary classification, the most common setup is one output unit with sigmoid.

python
1import numpy as np
2import tensorflow as tf
3
4model = tf.keras.Sequential([
5    tf.keras.layers.Input(shape=(4,)),
6    tf.keras.layers.Dense(8, activation="relu"),
7    tf.keras.layers.Dense(1, activation="sigmoid"),
8])
9
10model.compile(optimizer="adam", loss="binary_crossentropy")
11
12x = np.array([[0.1, 0.2, 0.3, 0.4]], dtype=np.float32)
13probs = model.predict(x, verbose=0)
14
15print(probs)

That output is the predicted probability of the positive class. If it returns 0.82, you usually interpret that as:

  • positive class probability about 0.82
  • negative class probability about 0.18

To turn the probability into a hard label, apply a threshold:

python
pred_class = (probs >= 0.5).astype(int)
print(pred_class)

Multi-Class Classification with softmax

For multi-class classification, the usual setup is one output per class with softmax.

python
1import numpy as np
2import tensorflow as tf
3
4model = tf.keras.Sequential([
5    tf.keras.layers.Input(shape=(4,)),
6    tf.keras.layers.Dense(8, activation="relu"),
7    tf.keras.layers.Dense(3, activation="softmax"),
8])
9
10model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")
11
12x = np.array([[0.1, 0.2, 0.3, 0.4]], dtype=np.float32)
13probs = model.predict(x, verbose=0)
14
15print(probs)
16print(probs.sum(axis=1))

Each row contains one probability per class, and the row total is about 1.0.

To pick the class:

python
pred_class = probs.argmax(axis=1)
print(pred_class)

When the Model Outputs Logits

Some models omit the final activation and are trained with from_logits=True. In that case, predict() returns raw scores, not probabilities.

Binary logits example:

python
1import numpy as np
2import tensorflow as tf
3
4model = tf.keras.Sequential([
5    tf.keras.layers.Input(shape=(4,)),
6    tf.keras.layers.Dense(1),
7])
8
9model.compile(
10    optimizer="adam",
11    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
12)
13
14x = np.array([[0.1, 0.2, 0.3, 0.4]], dtype=np.float32)
15logits = model.predict(x, verbose=0)
16probs = tf.nn.sigmoid(logits).numpy()
17
18print(logits)
19print(probs)

For multi-class logits, use tf.nn.softmax.

Why Keras Has No predict_proba

Developers coming from scikit-learn often look for predict_proba. In Keras, predict() already serves that purpose when the last activation produces probabilities.

A good rule is:

  • final sigmoid: predict() gives binary probabilities
  • final softmax: predict() gives class probability vectors
  • logits output: convert with sigmoid or softmax yourself

That is the main interpretation rule to remember for Keras classification outputs in practice across training and inference code everywhere in real projects too consistently. Checking that rule early also helps you choose the correct evaluation code and thresholding logic later.

Probabilities and Decision Thresholds Are Different

Returning probabilities is not the same as choosing a class. A model may output 0.62, but your business rule may choose a threshold other than 0.5.

python
pred_class = (probs >= 0.7).astype(int)

That can improve precision or recall depending on the application. So keep the probability output separate from the thresholding rule.

Common Pitfalls

One common mistake is assuming predict() always returns probabilities even when the model outputs logits.

Another issue is expecting a two-column matrix for binary classification when the model actually uses one sigmoid unit.

A third pitfall is treating scores as perfectly calibrated probabilities just because they are in the 0 to 1 range. Calibration is a separate model-quality question.

Summary

  • In Keras, predict() returns probabilities only when the final layer is probability-producing.
  • Use sigmoid for binary probability output and softmax for multi-class probability output.
  • If the model returns logits, convert them explicitly.
  • Probability scores and hard class decisions are separate steps.
  • Check the final layer and loss configuration before interpreting prediction output.

Course illustration
Course illustration

All Rights Reserved.