Returning probabilities in a classification prediction in Keras?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In Keras, model.predict() does not magically decide whether to return probabilities. The output depends on the model’s final layer: if the model ends with sigmoid or softmax, the predictions are already probability-like values; if it outputs logits, you must convert them yourself before interpreting them as probabilities.
Binary Classification with sigmoid
For binary classification, the most common setup is one output unit with sigmoid.
That output is the predicted probability of the positive class. If it returns 0.82, you usually interpret that as:
- positive class probability about
0.82 - negative class probability about
0.18
To turn the probability into a hard label, apply a threshold:
Multi-Class Classification with softmax
For multi-class classification, the usual setup is one output per class with softmax.
Each row contains one probability per class, and the row total is about 1.0.
To pick the class:
When the Model Outputs Logits
Some models omit the final activation and are trained with from_logits=True. In that case, predict() returns raw scores, not probabilities.
Binary logits example:
For multi-class logits, use tf.nn.softmax.
Why Keras Has No predict_proba
Developers coming from scikit-learn often look for predict_proba. In Keras, predict() already serves that purpose when the last activation produces probabilities.
A good rule is:
- final
sigmoid:predict()gives binary probabilities - final
softmax:predict()gives class probability vectors - logits output: convert with
sigmoidorsoftmaxyourself
That is the main interpretation rule to remember for Keras classification outputs in practice across training and inference code everywhere in real projects too consistently. Checking that rule early also helps you choose the correct evaluation code and thresholding logic later.
Probabilities and Decision Thresholds Are Different
Returning probabilities is not the same as choosing a class. A model may output 0.62, but your business rule may choose a threshold other than 0.5.
That can improve precision or recall depending on the application. So keep the probability output separate from the thresholding rule.
Common Pitfalls
One common mistake is assuming predict() always returns probabilities even when the model outputs logits.
Another issue is expecting a two-column matrix for binary classification when the model actually uses one sigmoid unit.
A third pitfall is treating scores as perfectly calibrated probabilities just because they are in the 0 to 1 range. Calibration is a separate model-quality question.
Summary
- In Keras,
predict()returns probabilities only when the final layer is probability-producing. - Use
sigmoidfor binary probability output andsoftmaxfor multi-class probability output. - If the model returns logits, convert them explicitly.
- Probability scores and hard class decisions are separate steps.
- Check the final layer and loss configuration before interpreting prediction output.

