ROC curve for binary classification in python

ROC Curve

Binary Classification

Python

Data Science

Machine Learning

ROC curve for binary classification in python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

A ROC curve shows how a binary classifier behaves as you move the decision threshold. Instead of asking only whether the current threshold is good, it lets you see the trade-off between true positive rate and false positive rate across all thresholds.

What the ROC curve plots

For each threshold, a binary classifier produces some number of:

true positives
false positives
true negatives
false negatives

From those counts you compute:

true positive rate, also called recall or sensitivity
false positive rate

The ROC curve plots false positive rate on the x-axis and true positive rate on the y-axis. A better model usually bends closer to the top-left corner because it reaches high recall without paying too much false-positive cost.

One important detail is that the ROC curve should be built from model scores or probabilities, not from already-thresholded class labels. If you only feed predicted labels, you collapse the curve into one operating point.

A practical Python example

Scikit-learn makes ROC evaluation straightforward.

python

1import matplotlib.pyplot as plt
2from sklearn.metrics import auc, roc_curve
3
4# True labels and model scores for the positive class
5# Use probabilities or decision scores, not hard labels.
6y_true = [0, 0, 1, 1, 1, 0, 1, 0]
7y_score = [0.05, 0.40, 0.35, 0.80, 0.90, 0.30, 0.60, 0.20]
8
9fpr, tpr, thresholds = roc_curve(y_true, y_score)
10roc_auc = auc(fpr, tpr)
11
12print("AUC:", roc_auc)
13
14plt.plot(fpr, tpr, label=f"ROC curve (AUC = {roc_auc:.3f})")
15plt.plot([0, 1], [0, 1], linestyle="--", label="random baseline")
16plt.xlabel("False Positive Rate")
17plt.ylabel("True Positive Rate")
18plt.title("ROC Curve")
19plt.legend()
20plt.show()

This code computes the ROC points, calculates the area under the curve, and plots both the model curve and a random baseline.

How to interpret AUC

The area under the ROC curve, often called ROC AUC, summarizes ranking quality. A value near 1.0 means the model tends to score positives above negatives consistently. A value near 0.5 means the model is roughly no better than random ranking.

That does not mean AUC alone chooses your production threshold. It tells you how well the model separates classes overall, not whether a specific threshold matches the business cost of false alarms versus missed positives.

Threshold choice is still a business decision

A medical screening model, a fraud detector, and a spam classifier may all want different operating points even if they have the same ROC AUC. The ROC curve helps you see what thresholds are possible, but your application decides which balance is acceptable.

That is why ROC analysis is usually followed by threshold selection, not replaced by it.

When ROC can mislead you

ROC curves can look impressive on highly imbalanced datasets because the false positive rate is normalized by the number of negative cases. If the positive class is rare and precision matters a lot, a precision-recall curve may be more informative.

This does not make ROC wrong. It means evaluation should match the operational question.

Common Pitfalls

Using predicted class labels instead of probabilities or decision scores.
Treating AUC as a replacement for choosing a real decision threshold.
Ignoring class imbalance when precision matters more than ranking quality.
Comparing ROC curves without checking whether the score outputs are comparable.
Forgetting that the random baseline is the diagonal line from (0, 0) to (1, 1).

Summary

A ROC curve shows true positive rate versus false positive rate across thresholds.
Build it from probabilities or decision scores, not hard class predictions.
'sklearn.metrics.roc_curve and auc make ROC analysis easy in Python.'
ROC AUC measures ranking quality, not the best production threshold by itself.
On imbalanced problems, also consider precision-recall analysis.