Keras
prediction uncertainty
machine learning
deep learning
uncertainty quantification

How to calculate prediction uncertainty using Keras?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A normal Keras prediction gives you an output value, not a reliable statement about how uncertain the model is. If you need uncertainty, you must choose a method that estimates it explicitly rather than assuming the prediction itself is enough.

In practice, the most common tools are Monte Carlo dropout, deep ensembles, and models that predict both a mean and a variance. Each one answers a slightly different question, so the first step is deciding what kind of uncertainty you care about.

Separate Epistemic and Aleatoric Uncertainty

Two uncertainty categories matter in most applied work:

  • epistemic uncertainty, which comes from limited model knowledge
  • aleatoric uncertainty, which comes from noise in the data itself

Monte Carlo dropout and deep ensembles are practical ways to approximate epistemic uncertainty. A mean-plus-variance model is more commonly used when you want the network to learn input-dependent noise.

This distinction matters because there is no single "uncertainty score" that captures everything equally well.

Monte Carlo Dropout in Keras

Monte Carlo dropout keeps dropout active during inference and samples the same input multiple times. The spread of those predictions becomes an uncertainty proxy.

python
1import numpy as np
2import tensorflow as tf
3from tensorflow import keras
4
5
6def build_model(input_dim):
7    inputs = keras.Input(shape=(input_dim,))
8    x = keras.layers.Dense(64, activation="relu")(inputs)
9    x = keras.layers.Dropout(0.3)(x)
10    x = keras.layers.Dense(64, activation="relu")(x)
11    x = keras.layers.Dropout(0.3)(x)
12    outputs = keras.layers.Dense(1)(x)
13    return keras.Model(inputs, outputs)
14
15
16model = build_model(10)
17model.compile(optimizer="adam", loss="mse")

The critical detail is that inference must run with training=True so dropout stays active:

python
1def mc_predict(model, x, n_samples=50):
2    samples = []
3    for _ in range(n_samples):
4        y = model(x, training=True)
5        samples.append(y.numpy())
6
7    samples = np.stack(samples, axis=0)
8    mean = samples.mean(axis=0)
9    std = samples.std(axis=0)
10    return mean, std

If the standard deviation is high, the model is producing unstable answers for that input.

Deep Ensembles

A strong alternative is to train several independently initialized models and compare their predictions. This is usually more expensive than MC dropout, but it often produces more stable uncertainty estimates.

python
1def train_ensemble(x_train, y_train, n_models=5):
2    models = []
3    for seed in range(n_models):
4        tf.keras.utils.set_random_seed(seed)
5        model = build_model(x_train.shape[1])
6        model.compile(optimizer="adam", loss="mse")
7        model.fit(x_train, y_train, epochs=10, batch_size=32, verbose=0)
8        models.append(model)
9    return models
10
11
12def ensemble_predict(models, x):
13    preds = np.stack([m.predict(x, verbose=0) for m in models], axis=0)
14    return preds.mean(axis=0), preds.std(axis=0)

The mean becomes the aggregate prediction, and the standard deviation across models becomes the uncertainty signal.

Predict Mean and Variance Directly

For aleatoric uncertainty, a common pattern is to make the network output both a mean and a log variance, then train with a negative log-likelihood style loss:

python
1def nll_loss(y_true, y_pred):
2    mean = y_pred[:, :1]
3    log_var = y_pred[:, 1:2]
4    precision = tf.exp(-log_var)
5    return tf.reduce_mean(0.5 * (log_var + precision * tf.square(y_true - mean)))
6
7
8def build_heteroscedastic_model(input_dim):
9    inputs = keras.Input(shape=(input_dim,))
10    x = keras.layers.Dense(64, activation="relu")(inputs)
11    outputs = keras.layers.Dense(2)(x)
12    model = keras.Model(inputs, outputs)
13    model.compile(optimizer="adam", loss=nll_loss)
14    return model

This is useful when some inputs are inherently noisier than others and you want the model to learn that directly.

Calibrate Before Trusting the Numbers

Uncertainty estimates are only useful if they correlate with actual error. Good validation questions include:

  • do higher uncertainty values match larger prediction errors
  • do confidence intervals cover the true values at the expected rate
  • does uncertainty rise on unusual or out-of-distribution inputs

Without those checks, an uncertainty pipeline can look sophisticated while still being systematically overconfident.

Common Pitfalls

The most common mistake in Monte Carlo dropout is forgetting to call the model with training=True. Without that flag, dropout is disabled and repeated predictions collapse to nearly the same value.

Another pitfall is assuming one technique covers every kind of uncertainty. It does not. Ensembles, MC dropout, and variance-predicting heads answer related but different questions.

People also calculate an uncertainty score and never define how the system should use it. If low-confidence predictions do not trigger review, abstention, or fallback behavior, most of the practical value is lost.

Finally, uncertainty should be evaluated, not admired. If you do not measure calibration, you do not know whether the score is helpful or decorative.

Summary

  • Keras does not provide meaningful prediction uncertainty automatically for ordinary models.
  • Monte Carlo dropout is a practical approximation for epistemic uncertainty.
  • Deep ensembles are often stronger but cost more to train and serve.
  • Mean-plus-variance outputs are useful for modeling aleatoric uncertainty.
  • Validate calibration before trusting uncertainty scores in production.

Course illustration
Course illustration

All Rights Reserved.