How to Plot PR-Curve Over 10 folds of Cross Validation in Scikit-Learn
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A precision-recall curve is often more informative than ROC when the positive class is rare. The tricky part with cross-validation is deciding what exactly to plot across folds without producing a misleading average.
A Better Cross-Validation Pattern for PR Curves
For ROC curves, people often interpolate fold curves onto a common grid and then average them. Precision-recall curves do not behave as cleanly under that kind of averaging because precision can jump sharply as thresholds move.
A practical approach in scikit-learn is:
- train on each fold
- collect out-of-fold prediction scores for the held-out data
- plot each fold lightly if you want variability
- build one pooled precision-recall curve from all held-out predictions combined
That pooled out-of-fold curve answers the most useful question: how does the model behave on unseen examples across the full dataset?
Example with 10-Fold Stratified Cross-Validation
The example below uses an imbalanced synthetic dataset, a scaling-plus-logistic-regression pipeline, and StratifiedKFold so each fold keeps a similar class ratio.
This produces thin fold-level curves plus one bold pooled curve. The bold curve is usually the one you want to discuss in a report because it is based entirely on held-out predictions.
Why Out-of-Fold Predictions Are Useful
Each prediction in the pooled curve comes from a model that did not train on that sample. That makes the final PR curve a realistic summary of cross-validated generalization.
It also avoids a common mistake: plotting a PR curve from predictions made on the full training set after one final fit. That curve is optimistic because the model has already seen those examples.
If you only want a scalar summary, report the cross-validated average precision as well. You can compute it from the pooled out-of-fold scores, or compute one value per fold and summarize the mean and standard deviation separately.
Common Pitfalls
- Averaging precision-recall points across folds as if PR curves behave like ROC curves. The result can be hard to interpret.
- Using plain
KFoldon an imbalanced dataset instead ofStratifiedKFold. - Plotting curves from in-sample predictions rather than held-out fold predictions.
- Forgetting that some estimators expose
decision_functioninstead ofpredict_proba. - Comparing models only by one PR curve without also checking class imbalance, threshold behavior, and average precision.
Summary
- For PR curves under cross-validation, pooled out-of-fold predictions are usually more informative than a naive pointwise fold average.
- '
StratifiedKFoldhelps preserve class balance across the 10 folds.' - Plot fold curves lightly if you want variability, and highlight one pooled held-out curve for the main result.
- Use
average_precision_scoreas a compact summary alongside the curve. - Always compute PR metrics on held-out predictions, not on data used for fitting.

