roc curve with sklearn python
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A ROC curve shows how a binary classifier trades off true positive rate against false positive rate as the decision threshold changes. In scikit-learn, the important implementation detail is that you should pass probability scores or decision scores to roc_curve, not hard class predictions.
What a ROC Curve Measures
For each threshold, the model produces:
- true positive rate, also called recall or sensitivity
- false positive rate, which measures how often negative examples are incorrectly flagged as positive
A classifier with strong separation pushes the curve toward the top-left corner. A random model tends to follow the diagonal.
The area under that curve, usually called AUC, summarizes ranking quality in one number.
Train a Simple Binary Classifier
Here is a runnable example using logistic regression.
The fitted model can now produce scores for the positive class.
Use Scores, Not Predicted Labels
This is the most important part. roc_curve expects continuous scores.
Why not use model.predict(X_test)? Because hard labels collapse the model output to one threshold only. A ROC curve needs the full score ranking so it can evaluate many thresholds.
For models without predict_proba, use decision_function if available.
Plot the ROC Curve
Use Matplotlib for a quick visualization.
The dashed diagonal is the random baseline. A better model stays above that line.
Interpret AUC Carefully
AUC answers a ranking question: how well does the model place positive examples above negative ones. It does not tell you whether the chosen threshold is good for your business objective.
That means you can have:
- a decent AUC but an unsuitable production threshold
- an excellent AUC on a balanced test set that becomes less useful under class imbalance
ROC is a useful diagnostic, not a complete evaluation strategy.
Compare Multiple Models
A ROC plot is especially useful for comparing models on the same test set.
This helps you compare ranking behavior rather than just a single threshold-dependent accuracy number.
Multiclass ROC Needs a Different Setup
ROC is naturally binary. For multiclass classification in scikit-learn, you usually evaluate one class versus the rest and compute ROC separately for each class.
That means if your problem has more than two classes, you need a one-vs-rest or similar strategy before applying the usual ROC workflow.
When ROC Is Not the Best Metric
If the positive class is rare and you care a lot about precision, a precision-recall curve may be more informative than ROC. ROC can still look strong even when false positives are operationally expensive.
So the right sequence is often:
- use ROC and AUC to understand score ranking
- use precision-recall and threshold metrics for operational tradeoffs
Common Pitfalls
Passing model.predict(...) into roc_curve instead of probability or decision scores.
Computing ROC on training data and calling the result performance. Use a held-out set.
Treating a good AUC as proof that the production threshold is correct.
Applying the binary recipe directly to a multiclass problem without a one-vs-rest setup.
Ignoring class imbalance and threshold costs when choosing the final model.
Summary
- In
scikit-learn, build ROC curves from continuous model scores, not hard labels. - Use
predict_proba(... )[:, 1]ordecision_functionfor the positive-class score. - '
roc_curvegives you the threshold sweep, androc_auc_scoresummarizes ranking quality.' - Plot ROC to compare models, but choose thresholds with additional metrics.
- For multiclass problems, use a one-vs-rest style evaluation rather than the plain binary recipe.

