Keras custom decision threshold for precision and recall
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Keras models usually output probabilities, not final yes-or-no decisions. Precision and recall depend on the threshold used to convert those probabilities into class labels, so a default threshold of 0.5 is only one possible operating point. If your application values false positives and false negatives differently, a custom threshold is often the right way to evaluate and deploy the model.
Threshold Changes the Confusion Matrix
For a binary classifier, the rule is typically:
- predict positive if probability is at least the threshold
- predict negative otherwise
A lower threshold usually increases recall and lowers precision. A higher threshold usually increases precision and lowers recall.
That is why threshold tuning is not a minor cosmetic change. It changes the confusion matrix itself.
Use Built-In Keras Precision and Recall with a Threshold
Keras already lets you set thresholds on these metrics.
This is often the simplest and most correct solution if the problem is binary classification.
Evaluate Several Thresholds at Once
You can track multiple operating points during training or evaluation.
That makes it easier to see how the model behaves at different decision points without retraining.
Pick the Threshold After Looking at Validation Data
The threshold should usually be selected from validation results, not guessed in advance. A common workflow is:
- train the model normally
- get validation-set probabilities
- compute precision and recall for candidate thresholds
- pick the threshold that matches business requirements
This often gives better operational insight than watching only one threshold during training.
Multiclass Is Different
The threshold story above is mainly for binary or one-vs-rest setups. In multiclass softmax classification, the usual decision rule is argmax, not a single global threshold.
If you need threshold behavior in multiclass classification, you are usually solving a one-vs-rest or abstention problem rather than standard multiclass prediction.
This is why teams often separate model training from threshold selection. The network learns scores first, then product or risk requirements determine which threshold turns those scores into actions.
That separation also makes experimentation easier. You can keep the same trained probability model and evaluate many candidate thresholds quickly without retraining the network each time.
Common Pitfalls
The biggest mistake is thinking precision and recall are fixed model properties. They are threshold-dependent metrics for probabilistic classifiers.
Another issue is tuning the threshold on the test set. That leaks evaluation information and makes the final performance estimate too optimistic.
People also often apply binary threshold thinking directly to multiclass softmax outputs without defining what the threshold is supposed to mean for multiple classes.
Finally, do not confuse training loss with decision threshold. Loss optimization usually happens on probabilities, while thresholding is part of evaluation and deployment policy.
Summary
- Precision and recall depend on the decision threshold used on predicted probabilities.
- In Keras binary classification, you can set thresholds directly on built-in
PrecisionandRecallmetrics. - It is often useful to monitor more than one threshold.
- Choose the deployment threshold from validation data, not the test set.
- For multiclass models, thresholding is a different problem from ordinary argmax classification.

