LinearSVC
Scikit-learn
Python
Machine Learning
Probability Conversion

Converting LinearSVC's decision function to probabilities Scikit learn python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

LinearSVC gives fast linear classification with strong performance on high-dimensional data, but it does not expose predict_proba directly. Many real systems still need probabilities for ranking, threshold tuning, and risk communication. The right solution is calibration, where decision scores are mapped to probability estimates on held-out folds.

Why decision_function Is Not a Probability

decision_function returns signed distance from the separating boundary. Larger positive values mean stronger evidence for the positive class, but not a calibrated probability scale.

A score of 2.0 in one dataset does not imply the same confidence as 2.0 in another dataset. Class balance, margin distribution, and feature scaling all affect interpretation.

python
1from sklearn.datasets import make_classification
2from sklearn.svm import LinearSVC
3
4X, y = make_classification(n_samples=500, n_features=20, random_state=42)
5model = LinearSVC(random_state=42)
6model.fit(X, y)
7
8scores = model.decision_function(X[:5])
9print(scores)

These values are useful for ranking, but not yet for probability-based business rules.

Calibrate Scores With CalibratedClassifierCV

Scikit-learn provides CalibratedClassifierCV to fit a mapping from scores to probabilities. Common methods are sigmoid and isotonic.

python
1from sklearn.calibration import CalibratedClassifierCV
2from sklearn.model_selection import train_test_split
3from sklearn.metrics import brier_score_loss, log_loss
4
5X_train, X_test, y_train, y_test = train_test_split(
6    X, y, test_size=0.3, random_state=42
7)
8
9base = LinearSVC(random_state=42)
10calibrated = CalibratedClassifierCV(base, method="sigmoid", cv=5)
11calibrated.fit(X_train, y_train)
12
13proba = calibrated.predict_proba(X_test)[:, 1]
14print(proba[:10])
15print("Brier:", brier_score_loss(y_test, proba))
16print("Log loss:", log_loss(y_test, calibrated.predict_proba(X_test)))

After calibration, probabilities become suitable for threshold optimization and downstream decisioning.

Choosing Between sigmoid and isotonic

sigmoid is usually stable on moderate datasets and often works well as a default. isotonic is more flexible, but can overfit when calibration data is limited.

Practical guidance:

  • Start with sigmoid for small to medium datasets.
  • Try isotonic when you have enough calibration data and see systematic miscalibration.
  • Compare calibration metrics, not only accuracy.

Build a Full Pipeline

Probability quality depends on consistent preprocessing. Wrap scaling and calibration in one pipeline so training and inference apply identical transforms.

python
1from sklearn.pipeline import make_pipeline
2from sklearn.preprocessing import StandardScaler
3
4pipeline = make_pipeline(
5    StandardScaler(),
6    CalibratedClassifierCV(LinearSVC(random_state=42), method="sigmoid", cv=5)
7)
8
9pipeline.fit(X_train, y_train)
10probability = pipeline.predict_proba(X_test[:1])[0, 1]
11print(probability)

This reduces training-serving skew and makes model artifacts easier to deploy.

Multiclass Notes

For multiclass tasks, calibration still works, but evaluate per-class behavior and overall calibration quality. Probability vectors should be assessed with multiclass log loss and calibration plots where possible.

When classes are highly imbalanced, consider class weights in LinearSVC and inspect calibration quality for minority classes separately.

What Not to Do

A common shortcut is applying a manual sigmoid directly to raw scores. Without fitted calibration parameters and validation folds, that mapping is usually miscalibrated.

Another mistake is calibrating on the same data used to fit the base classifier without cross-validation separation. That inflates quality estimates and yields overconfident probabilities in production.

Choosing Decision Thresholds After Calibration

Once probabilities are calibrated, select decision thresholds using business cost, not default 0.5 only. For example, fraud screening may prefer higher recall, while automated approval may require higher precision.

Compute precision and recall across candidate thresholds on validation data, then lock the chosen threshold with a documented rationale.

Quick Reliability Check With Calibration Curves

Add a calibration curve check in experiments to catch overconfident probability outputs early. Even a simple visual check can reveal whether predicted probabilities align with observed frequencies across bins.

Common Pitfalls

  • Expecting LinearSVC to expose predict_proba without calibration.
  • Applying a hand-written sigmoid to decision scores.
  • Evaluating only accuracy while ignoring calibration metrics.
  • Using isotonic calibration with too little data.
  • Omitting preprocessing from the deployed inference pipeline.

Summary

  • LinearSVC decision scores are margins, not calibrated probabilities.
  • Use CalibratedClassifierCV to map scores to probabilities.
  • Prefer sigmoid as a strong default, then validate alternatives.
  • Track calibration quality with metrics such as Brier score and log loss.
  • Package preprocessing and calibration together for reliable production inference.

Course illustration
Course illustration

All Rights Reserved.