Converting LinearSVC's decision function to probabilities Scikit learn python
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
LinearSVC gives fast linear classification with strong performance on high-dimensional data, but it does not expose predict_proba directly. Many real systems still need probabilities for ranking, threshold tuning, and risk communication. The right solution is calibration, where decision scores are mapped to probability estimates on held-out folds.
Why decision_function Is Not a Probability
decision_function returns signed distance from the separating boundary. Larger positive values mean stronger evidence for the positive class, but not a calibrated probability scale.
A score of 2.0 in one dataset does not imply the same confidence as 2.0 in another dataset. Class balance, margin distribution, and feature scaling all affect interpretation.
These values are useful for ranking, but not yet for probability-based business rules.
Calibrate Scores With CalibratedClassifierCV
Scikit-learn provides CalibratedClassifierCV to fit a mapping from scores to probabilities. Common methods are sigmoid and isotonic.
After calibration, probabilities become suitable for threshold optimization and downstream decisioning.
Choosing Between sigmoid and isotonic
sigmoid is usually stable on moderate datasets and often works well as a default. isotonic is more flexible, but can overfit when calibration data is limited.
Practical guidance:
- Start with
sigmoidfor small to medium datasets. - Try
isotonicwhen you have enough calibration data and see systematic miscalibration. - Compare calibration metrics, not only accuracy.
Build a Full Pipeline
Probability quality depends on consistent preprocessing. Wrap scaling and calibration in one pipeline so training and inference apply identical transforms.
This reduces training-serving skew and makes model artifacts easier to deploy.
Multiclass Notes
For multiclass tasks, calibration still works, but evaluate per-class behavior and overall calibration quality. Probability vectors should be assessed with multiclass log loss and calibration plots where possible.
When classes are highly imbalanced, consider class weights in LinearSVC and inspect calibration quality for minority classes separately.
What Not to Do
A common shortcut is applying a manual sigmoid directly to raw scores. Without fitted calibration parameters and validation folds, that mapping is usually miscalibrated.
Another mistake is calibrating on the same data used to fit the base classifier without cross-validation separation. That inflates quality estimates and yields overconfident probabilities in production.
Choosing Decision Thresholds After Calibration
Once probabilities are calibrated, select decision thresholds using business cost, not default 0.5 only. For example, fraud screening may prefer higher recall, while automated approval may require higher precision.
Compute precision and recall across candidate thresholds on validation data, then lock the chosen threshold with a documented rationale.
Quick Reliability Check With Calibration Curves
Add a calibration curve check in experiments to catch overconfident probability outputs early. Even a simple visual check can reveal whether predicted probabilities align with observed frequencies across bins.
Common Pitfalls
- Expecting
LinearSVCto exposepredict_probawithout calibration. - Applying a hand-written sigmoid to decision scores.
- Evaluating only accuracy while ignoring calibration metrics.
- Using isotonic calibration with too little data.
- Omitting preprocessing from the deployed inference pipeline.
Summary
LinearSVCdecision scores are margins, not calibrated probabilities.- Use
CalibratedClassifierCVto map scores to probabilities. - Prefer
sigmoidas a strong default, then validate alternatives. - Track calibration quality with metrics such as Brier score and log loss.
- Package preprocessing and calibration together for reliable production inference.

