Keras custom decision threshold for precision and recall

Keras

decision threshold

precision

recall

machine learning

Keras custom decision threshold for precision and recall

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Keras models usually output probabilities, not final yes-or-no decisions. Precision and recall depend on the threshold used to convert those probabilities into class labels, so a default threshold of 0.5 is only one possible operating point. If your application values false positives and false negatives differently, a custom threshold is often the right way to evaluate and deploy the model.

Threshold Changes the Confusion Matrix

For a binary classifier, the rule is typically:

predict positive if probability is at least the threshold
predict negative otherwise

A lower threshold usually increases recall and lowers precision. A higher threshold usually increases precision and lowers recall.

That is why threshold tuning is not a minor cosmetic change. It changes the confusion matrix itself.

Use Built-In Keras Precision and Recall with a Threshold

Keras already lets you set thresholds on these metrics.

python

1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Input(shape=(20,)),
5    tf.keras.layers.Dense(16, activation="relu"),
6    tf.keras.layers.Dense(1, activation="sigmoid"),
7])
8
9model.compile(
10    optimizer="adam",
11    loss="binary_crossentropy",
12    metrics=[
13        tf.keras.metrics.Precision(thresholds=0.3, name="precision_at_03"),
14        tf.keras.metrics.Recall(thresholds=0.3, name="recall_at_03"),
15    ],
16)

This is often the simplest and most correct solution if the problem is binary classification.

Evaluate Several Thresholds at Once

You can track multiple operating points during training or evaluation.

python

1model.compile(
2    optimizer="adam",
3    loss="binary_crossentropy",
4    metrics=[
5        tf.keras.metrics.Precision(thresholds=0.3, name="precision_at_03"),
6        tf.keras.metrics.Recall(thresholds=0.3, name="recall_at_03"),
7        tf.keras.metrics.Precision(thresholds=0.7, name="precision_at_07"),
8        tf.keras.metrics.Recall(thresholds=0.7, name="recall_at_07"),
9    ],
10)

That makes it easier to see how the model behaves at different decision points without retraining.

Pick the Threshold After Looking at Validation Data

The threshold should usually be selected from validation results, not guessed in advance. A common workflow is:

train the model normally
get validation-set probabilities
compute precision and recall for candidate thresholds
pick the threshold that matches business requirements

python

1import numpy as np
2from sklearn.metrics import precision_score, recall_score
3
4probs = model.predict(x_val, verbose=0).ravel()
5
6for threshold in [0.2, 0.4, 0.6, 0.8]:
7    preds = (probs >= threshold).astype(int)
8    p = precision_score(y_val, preds)
9    r = recall_score(y_val, preds)
10    print(threshold, p, r)

This often gives better operational insight than watching only one threshold during training.

Multiclass Is Different

The threshold story above is mainly for binary or one-vs-rest setups. In multiclass softmax classification, the usual decision rule is argmax, not a single global threshold.

If you need threshold behavior in multiclass classification, you are usually solving a one-vs-rest or abstention problem rather than standard multiclass prediction.

This is why teams often separate model training from threshold selection. The network learns scores first, then product or risk requirements determine which threshold turns those scores into actions.

That separation also makes experimentation easier. You can keep the same trained probability model and evaluate many candidate thresholds quickly without retraining the network each time.

Common Pitfalls

The biggest mistake is thinking precision and recall are fixed model properties. They are threshold-dependent metrics for probabilistic classifiers.

Another issue is tuning the threshold on the test set. That leaks evaluation information and makes the final performance estimate too optimistic.

People also often apply binary threshold thinking directly to multiclass softmax outputs without defining what the threshold is supposed to mean for multiple classes.

Finally, do not confuse training loss with decision threshold. Loss optimization usually happens on probabilities, while thresholding is part of evaluation and deployment policy.

Summary

Precision and recall depend on the decision threshold used on predicted probabilities.
In Keras binary classification, you can set thresholds directly on built-in Precision and Recall metrics.
It is often useful to monitor more than one threshold.
Choose the deployment threshold from validation data, not the test set.
For multiclass models, thresholding is a different problem from ordinary argmax classification.