Scikit-Learn
PR-Curve
Cross Validation
Machine Learning
Data Science

How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

A precision-recall curve is often more informative than ROC when the positive class is rare. The tricky part with cross-validation is deciding what exactly to plot across folds without producing a misleading average.

A Better Cross-Validation Pattern for PR Curves

For ROC curves, people often interpolate fold curves onto a common grid and then average them. Precision-recall curves do not behave as cleanly under that kind of averaging because precision can jump sharply as thresholds move.

A practical approach in scikit-learn is:

  • train on each fold
  • collect out-of-fold prediction scores for the held-out data
  • plot each fold lightly if you want variability
  • build one pooled precision-recall curve from all held-out predictions combined

That pooled out-of-fold curve answers the most useful question: how does the model behave on unseen examples across the full dataset?

Example with 10-Fold Stratified Cross-Validation

The example below uses an imbalanced synthetic dataset, a scaling-plus-logistic-regression pipeline, and StratifiedKFold so each fold keeps a similar class ratio.

python
1import numpy as np
2import matplotlib.pyplot as plt
3from sklearn.datasets import make_classification
4from sklearn.linear_model import LogisticRegression
5from sklearn.metrics import PrecisionRecallDisplay, average_precision_score, precision_recall_curve
6from sklearn.model_selection import StratifiedKFold
7from sklearn.pipeline import make_pipeline
8from sklearn.preprocessing import StandardScaler
9
10X, y = make_classification(
11    n_samples=1200,
12    n_features=20,
13    n_informative=6,
14    n_redundant=2,
15    weights=[0.85, 0.15],
16    random_state=42,
17)
18
19cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
20model = make_pipeline(StandardScaler(), LogisticRegression(max_iter=1000))
21
22all_true = []
23all_scores = []
24
25plt.figure(figsize=(8, 6))
26
27for fold, (train_idx, test_idx) in enumerate(cv.split(X, y), start=1):
28    X_train, X_test = X[train_idx], X[test_idx]
29    y_train, y_test = y[train_idx], y[test_idx]
30
31    model.fit(X_train, y_train)
32    y_score = model.predict_proba(X_test)[:, 1]
33
34    all_true.append(y_test)
35    all_scores.append(y_score)
36
37    PrecisionRecallDisplay.from_predictions(
38        y_test,
39        y_score,
40        name=f"Fold {fold}",
41        alpha=0.2,
42        lw=1,
43    )
44
45all_true = np.concatenate(all_true)
46all_scores = np.concatenate(all_scores)
47precision, recall, _ = precision_recall_curve(all_true, all_scores)
48ap = average_precision_score(all_true, all_scores)
49
50plt.plot(recall, precision, color="black", lw=2.5, label=f"Out-of-fold AP = {ap:.3f}")
51plt.xlabel("Recall")
52plt.ylabel("Precision")
53plt.title("10-fold cross-validated precision-recall curve")
54plt.legend(loc="lower left")
55plt.grid(True)
56plt.tight_layout()
57plt.show()

This produces thin fold-level curves plus one bold pooled curve. The bold curve is usually the one you want to discuss in a report because it is based entirely on held-out predictions.

Why Out-of-Fold Predictions Are Useful

Each prediction in the pooled curve comes from a model that did not train on that sample. That makes the final PR curve a realistic summary of cross-validated generalization.

It also avoids a common mistake: plotting a PR curve from predictions made on the full training set after one final fit. That curve is optimistic because the model has already seen those examples.

If you only want a scalar summary, report the cross-validated average precision as well. You can compute it from the pooled out-of-fold scores, or compute one value per fold and summarize the mean and standard deviation separately.

Common Pitfalls

  • Averaging precision-recall points across folds as if PR curves behave like ROC curves. The result can be hard to interpret.
  • Using plain KFold on an imbalanced dataset instead of StratifiedKFold.
  • Plotting curves from in-sample predictions rather than held-out fold predictions.
  • Forgetting that some estimators expose decision_function instead of predict_proba.
  • Comparing models only by one PR curve without also checking class imbalance, threshold behavior, and average precision.

Summary

  • For PR curves under cross-validation, pooled out-of-fold predictions are usually more informative than a naive pointwise fold average.
  • 'StratifiedKFold helps preserve class balance across the 10 folds.'
  • Plot fold curves lightly if you want variability, and highlight one pooled held-out curve for the main result.
  • Use average_precision_score as a compact summary alongside the curve.
  • Always compute PR metrics on held-out predictions, not on data used for fitting.

Course illustration
Course illustration