keras what is the difference between model.predict and model.predict_proba

Keras

Model Prediction

Model Predict

Model Predict Proba

Machine Learning

keras what is the difference between model.predict and model.predict_proba

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

In Keras, model.predict() and model.predict_proba() return the same output — there is no difference. predict_proba() was a scikit-learn compatibility wrapper that simply called predict() internally. It was deprecated in TensorFlow 2.6 and removed in later versions. For classification models with softmax or sigmoid output, model.predict() already returns probabilities. For class labels, use np.argmax(model.predict(x), axis=1) (multi-class) or (model.predict(x) > 0.5).astype(int) (binary).

model.predict() Returns Probabilities

python

1import numpy as np
2from tensorflow import keras
3
4# Binary classification model
5model = keras.Sequential([
6    keras.layers.Dense(64, activation='relu', input_shape=(10,)),
7    keras.layers.Dense(1, activation='sigmoid')  # Outputs probability [0, 1]
8])
9model.compile(optimizer='adam', loss='binary_crossentropy')
10
11# predict() returns probabilities
12X_test = np.random.randn(5, 10)
13probabilities = model.predict(X_test)
14print(probabilities)
15# [[0.73],
16#  [0.12],
17#  [0.89],
18#  [0.45],
19#  [0.67]]

With a sigmoid output layer, model.predict() returns values between 0 and 1 — these are already class probabilities.

Multi-Class Classification

python

1# Multi-class model (3 classes)
2model = keras.Sequential([
3    keras.layers.Dense(64, activation='relu', input_shape=(10,)),
4    keras.layers.Dense(3, activation='softmax')  # 3 class probabilities
5])
6model.compile(optimizer='adam', loss='categorical_crossentropy')
7
8probabilities = model.predict(X_test)
9print(probabilities)
10# [[0.15, 0.70, 0.15],
11#  [0.80, 0.10, 0.10],
12#  [0.05, 0.05, 0.90],
13#  ...]
14
15# Each row sums to 1.0
16print(probabilities.sum(axis=1))
17# [1.0, 1.0, 1.0, ...]

With a softmax output layer, each row contains probabilities for all classes that sum to 1.

Converting Probabilities to Class Labels

python

1import numpy as np
2
3# Multi-class: argmax gives the class index
4probabilities = model.predict(X_test)
5class_labels = np.argmax(probabilities, axis=1)
6print(class_labels)  # [1, 0, 2, ...]
7
8# Binary: threshold at 0.5
9probabilities = model.predict(X_test)  # Shape: (N, 1)
10class_labels = (probabilities > 0.5).astype(int).flatten()
11print(class_labels)  # [1, 0, 1, 0, 1]

What predict_proba Was

python

1# DEPRECATED — do not use
2# This existed for scikit-learn API compatibility
3
4# In old Keras/TF versions:
5model.predict_proba(X_test)  # Identical to model.predict(X_test)
6
7# In TF 2.6+:
8# DeprecationWarning: predict_proba is deprecated and will be removed
9# In TF 2.12+:
10# AttributeError: 'Sequential' object has no attribute 'predict_proba'

If you see code using predict_proba(), replace it with predict() — the output is identical.

Regression Models

python

1# Regression: predict() returns continuous values (not probabilities)
2model = keras.Sequential([
3    keras.layers.Dense(64, activation='relu', input_shape=(10,)),
4    keras.layers.Dense(1)  # No activation — raw output
5])
6model.compile(optimizer='adam', loss='mse')
7
8predictions = model.predict(X_test)
9print(predictions)
10# [[23.5],
11#  [17.2],
12#  [42.8],
13#  ...]
14# These are NOT probabilities — they are predicted values

For regression, there is no concept of "predict_proba" — the output is the predicted target value.

Scikit-Learn Wrapper Compatibility

python

1from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
2
3# The scikit-learn wrapper provides predict_proba for compatibility
4def create_model():
5    model = keras.Sequential([
6        keras.layers.Dense(64, activation='relu', input_shape=(10,)),
7        keras.layers.Dense(3, activation='softmax')
8    ])
9    model.compile(optimizer='adam', loss='categorical_crossentropy')
10    return model
11
12# KerasClassifier wraps the model for scikit-learn pipelines
13clf = KerasClassifier(build_fn=create_model, epochs=10, batch_size=32)
14clf.fit(X_train, y_train)
15
16# predict() returns class labels
17labels = clf.predict(X_test)
18# [1, 0, 2, ...]
19
20# predict_proba() returns probabilities
21probs = clf.predict_proba(X_test)
22# [[0.15, 0.70, 0.15], ...]

In scikit-learn's API, predict() returns class labels and predict_proba() returns probabilities. The Keras wrapper follows this convention, which is the origin of the confusion.

Comparison Table

Method	Keras (native)	Scikit-Learn	KerasClassifier wrapper
`predict()`	Raw output (probabilities for classification)	Class labels	Class labels
`predict_proba()`	Deprecated (same as predict)	Class probabilities	Class probabilities
`predict_classes()`	Removed in TF 2.6	N/A	N/A

Batch Prediction and Performance

python

1# predict() processes data in batches for efficiency
2predictions = model.predict(X_test, batch_size=64)
3
4# For a single sample, predict() has overhead — use __call__ instead
5single_prediction = model(X_test[:1], training=False).numpy()
6
7# Or for TF 2.x:
8single_prediction = model.predict(X_test[:1], verbose=0)

Common Pitfalls

Using predict_proba() on modern TF: predict_proba() was removed in recent TensorFlow versions. Replace all predict_proba() calls with predict(). The output is identical for Keras models.
Expecting class labels from predict(): Native Keras predict() returns raw model output (probabilities for classification). Use np.argmax() or thresholding to convert to class labels. Do not confuse with scikit-learn's predict() which returns labels.
Forgetting the activation function: If the last layer has no activation (Dense(3) without softmax), predict() returns raw logits, not probabilities. Apply softmax manually: tf.nn.softmax(model.predict(X)).
Using predict_classes(): model.predict_classes() was also removed in TF 2.6. Use np.argmax(model.predict(x), axis=1) instead.
Predict on single sample: model.predict(single_sample) expects a batch dimension. Reshape with np.expand_dims(sample, 0) or sample.reshape(1, -1) before calling predict.

Summary

model.predict() and model.predict_proba() return the same output in Keras
predict_proba() was deprecated and removed — always use predict()
For classification with softmax/sigmoid, predict() already returns probabilities
Convert to class labels with np.argmax() (multi-class) or thresholding (binary)
The scikit-learn KerasClassifier wrapper has different semantics where predict() returns labels and predict_proba() returns probabilities
For regression models, predict() returns continuous values, not probabilities