How to plot ROC curves for every cross-validations using Caret
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When training classification models with caret, one aggregate ROC value is often not enough. Plotting ROC curves for every cross-validation fold helps you understand variation across splits and detect unstable model behavior. The key is saving fold-level predictions during resampling.
Configure Caret to Keep Resample Predictions
To draw fold ROC curves, enable class probabilities and save out-of-fold predictions. In caret this is done through trainControl.
The fit$pred data frame now contains fold predictions and probabilities you can use to compute ROC curves per fold.
Compute ROC Curves for Each Fold
Each row in fit$pred includes the observed class, predicted class, fold identifier, and class probability columns. Build one ROC object per fold.
This gives you true positive rate and false positive rate for each fold, plus fold-specific AUC.
Plot Fold ROC Curves and the Mean Pattern
You can visualize all fold curves together and optionally overlay their mean trajectory.
If fold curves vary widely, the model may be sensitive to data partitioning. That can be a signal to increase sample size, simplify the model, or improve feature engineering.
Practical Interpretation Tips
Use fold ROC plots alongside fold AUC summaries. A high mean AUC with one or two weak folds can still be risky in production if your data distribution shifts. Also review confusion matrices and calibration for thresholds used by downstream systems.
For fair comparison across model types, use the same cross-validation setup and seed strategy. Consistent resampling design makes fold ROC differences meaningful.
Aggregate AUC by Fold for Quick Diagnostics
Besides plotting the curves, summarize fold AUC values in a compact table. This makes instability easy to spot during model comparison.
A high standard deviation means model quality depends heavily on how data is split. In that case, evaluate whether class balance, leakage, or small sample size is driving variability.
Compare Models with the Same Fold Visualization
The same process can be repeated for multiple algorithms. Use consistent resampling control and then combine curve data with a model label column for side-by-side inspection. Visual differences across folds are often easier to interpret than a single leaderboard value.
In production model selection, this helps prevent choosing a model that looks best on average but behaves inconsistently across subsets of the data.
Common Pitfalls
- Forgetting
savePredictions, which leaves no fold-level probabilities to plot. - Using a multiclass target directly with binary ROC code.
- Mixing tuning rows when extracting predictions from
fit$pred. - Interpreting one fold curve as representative of all model behavior.
Summary
- Enable probability output and saved predictions in caret control settings.
- Compute one ROC curve per resample using fold-specific predictions.
- Plot all fold curves to assess variability, not only average performance.
- Combine fold ROC analysis with AUC and threshold-based metrics.

