scikit-learn
predict
default threshold
machine learning
classification

scikit-learn .predict default threshold

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In scikit-learn, .predict() does not expose a tunable threshold parameter directly, but binary classifiers still use a built-in decision rule to turn scores into class labels. The important detail is that the default rule depends on what kind of score the estimator provides: probability estimates are typically thresholded at 0.5, while decision scores are typically thresholded at 0.

The Binary Classification Default

For binary classification, scikit-learn's current user guide describes hard-coded cut-off rules for converting model scores into labels:

  • if the estimator uses predict_proba, the positive class is predicted when the positive-class probability is greater than 0.5
  • if the estimator uses decision_function, the positive class is predicted when the decision score is greater than 0

This means the intuition "scikit-learn uses 0.5" is only partly true. It is accurate for probability-based prediction, but not for every classifier or scoring method.

Example with predict_proba

Logistic regression is a common example because it exposes class probabilities.

python
1from sklearn.datasets import make_classification
2from sklearn.linear_model import LogisticRegression
3
4X, y = make_classification(random_state=0)
5model = LogisticRegression().fit(X, y)
6
7proba = model.predict_proba(X[:5])[:, 1]
8pred = model.predict(X[:5])
9
10print(proba)
11print(pred)

If a sample's positive-class probability is at least around 0.5, .predict() will classify it as the positive class. If the probability is lower, .predict() returns the negative class.

If you want a different threshold, you must apply that logic yourself or wrap the estimator with a threshold-tuning approach.

Example with decision_function

Some estimators expose decision scores rather than direct probabilities. In those cases, the default decision boundary is around 0.

python
1from sklearn.datasets import make_classification
2from sklearn.svm import LinearSVC
3
4X, y = make_classification(random_state=0)
5model = LinearSVC(random_state=0).fit(X, y)
6
7scores = model.decision_function(X[:5])
8pred = model.predict(X[:5])
9
10print(scores)
11print(pred)

Positive scores correspond to one class and negative scores to the other. So the default label threshold is not 0.5 here; it is the sign boundary at 0.

Multiclass Prediction Is Different

For multiclass classifiers, .predict() usually chooses the class with the highest score or probability. There is no single binary-style threshold such as 0.5 because the decision is based on which class wins relative to the others.

python
1from sklearn.datasets import load_iris
2from sklearn.linear_model import LogisticRegression
3
4X, y = load_iris(return_X_y=True)
5model = LogisticRegression(max_iter=200).fit(X, y)
6
7print(model.predict_proba(X[:3]))
8print(model.predict(X[:3]))

The predicted class is the one with the largest probability estimate among all classes, not the one that crosses a universal cutoff.

How to Use a Custom Threshold

If your business problem values recall, precision, or false-positive control differently from the default, use the raw scores and apply your own threshold.

python
1from sklearn.datasets import make_classification
2from sklearn.linear_model import LogisticRegression
3
4X, y = make_classification(random_state=0)
5model = LogisticRegression().fit(X, y)
6
7proba = model.predict_proba(X)[:, 1]
8custom_pred = (proba >= 0.7).astype(int)
9
10print(custom_pred[:10])

This is common for imbalanced datasets or high-cost errors, where the default threshold is rarely optimal.

Current scikit-learn versions also document utilities such as TunedThresholdClassifierCV and FixedThresholdClassifier for post-training threshold control. Those are helpful when you want threshold selection to be explicit and reproducible rather than scattered through ad hoc prediction code.

Why the Default May Be Wrong for Your Problem

The built-in threshold is a generic API default, not a statement that 0.5 or 0 is ideal for your use case. Fraud detection, medical screening, moderation pipelines, and anomaly triage often care much more about one error type than the other. In those domains, the best operating point may be far from the library default.

That is why evaluating ROC curves, precision-recall behavior, and domain cost tradeoffs matters more than memorizing one threshold number.

Common Pitfalls

The most common mistake is assuming every classifier uses a 0.5 probability cutoff internally. That is not true for estimators that rely on decision_function. Another is treating .predict() as if it were the only valid output interface, when in practice threshold-aware workflows should often use predict_proba or decision_function directly. Teams also forget that multiclass prediction is based on the highest score, not a binary cutoff rule.

Summary

  • In binary classification, .predict() uses a built-in decision rule to convert scores into labels.
  • For predict_proba, the default cutoff is typically 0.5.
  • For decision_function, the default cutoff is typically 0.
  • Multiclass prediction usually chooses the class with the highest score or probability.
  • If the default decision rule is not right for the problem, use raw scores and apply or tune a custom threshold.

Course illustration
Course illustration

All Rights Reserved.