scikit-learn .predict default threshold
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In scikit-learn, .predict() does not expose a tunable threshold parameter directly, but binary classifiers still use a built-in decision rule to turn scores into class labels. The important detail is that the default rule depends on what kind of score the estimator provides: probability estimates are typically thresholded at 0.5, while decision scores are typically thresholded at 0.
The Binary Classification Default
For binary classification, scikit-learn's current user guide describes hard-coded cut-off rules for converting model scores into labels:
- if the estimator uses
predict_proba, the positive class is predicted when the positive-class probability is greater than0.5 - if the estimator uses
decision_function, the positive class is predicted when the decision score is greater than0
This means the intuition "scikit-learn uses 0.5" is only partly true. It is accurate for probability-based prediction, but not for every classifier or scoring method.
Example with predict_proba
Logistic regression is a common example because it exposes class probabilities.
If a sample's positive-class probability is at least around 0.5, .predict() will classify it as the positive class. If the probability is lower, .predict() returns the negative class.
If you want a different threshold, you must apply that logic yourself or wrap the estimator with a threshold-tuning approach.
Example with decision_function
Some estimators expose decision scores rather than direct probabilities. In those cases, the default decision boundary is around 0.
Positive scores correspond to one class and negative scores to the other. So the default label threshold is not 0.5 here; it is the sign boundary at 0.
Multiclass Prediction Is Different
For multiclass classifiers, .predict() usually chooses the class with the highest score or probability. There is no single binary-style threshold such as 0.5 because the decision is based on which class wins relative to the others.
The predicted class is the one with the largest probability estimate among all classes, not the one that crosses a universal cutoff.
How to Use a Custom Threshold
If your business problem values recall, precision, or false-positive control differently from the default, use the raw scores and apply your own threshold.
This is common for imbalanced datasets or high-cost errors, where the default threshold is rarely optimal.
Current scikit-learn versions also document utilities such as TunedThresholdClassifierCV and FixedThresholdClassifier for post-training threshold control. Those are helpful when you want threshold selection to be explicit and reproducible rather than scattered through ad hoc prediction code.
Why the Default May Be Wrong for Your Problem
The built-in threshold is a generic API default, not a statement that 0.5 or 0 is ideal for your use case. Fraud detection, medical screening, moderation pipelines, and anomaly triage often care much more about one error type than the other. In those domains, the best operating point may be far from the library default.
That is why evaluating ROC curves, precision-recall behavior, and domain cost tradeoffs matters more than memorizing one threshold number.
Common Pitfalls
The most common mistake is assuming every classifier uses a 0.5 probability cutoff internally. That is not true for estimators that rely on decision_function. Another is treating .predict() as if it were the only valid output interface, when in practice threshold-aware workflows should often use predict_proba or decision_function directly. Teams also forget that multiclass prediction is based on the highest score, not a binary cutoff rule.
Summary
- In binary classification,
.predict()uses a built-in decision rule to convert scores into labels. - For
predict_proba, the default cutoff is typically0.5. - For
decision_function, the default cutoff is typically0. - Multiclass prediction usually chooses the class with the highest score or probability.
- If the default decision rule is not right for the problem, use raw scores and apply or tune a custom threshold.

