ROC curves
cross-validation
Caret package
machine learning
R programming

How to plot ROC curves for every cross-validations using Caret

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When training classification models with caret, one aggregate ROC value is often not enough. Plotting ROC curves for every cross-validation fold helps you understand variation across splits and detect unstable model behavior. The key is saving fold-level predictions during resampling.

Configure Caret to Keep Resample Predictions

To draw fold ROC curves, enable class probabilities and save out-of-fold predictions. In caret this is done through trainControl.

r
1library(caret)
2library(pROC)
3library(ggplot2)
4
5set.seed(42)
6
7data(iris)
8# Build a binary problem for ROC.
9iris_bin <- subset(iris, Species != "setosa")
10iris_bin$Species <- factor(ifelse(iris_bin$Species == "versicolor", "yes", "no"))
11iris_bin$Species <- relevel(iris_bin$Species, ref = "yes")
12
13ctrl <- trainControl(
14  method = "cv",
15  number = 5,
16  classProbs = TRUE,
17  summaryFunction = twoClassSummary,
18  savePredictions = "final"
19)
20
21fit <- train(
22  Species ~ .,
23  data = iris_bin,
24  method = "glm",
25  family = binomial(),
26  trControl = ctrl,
27  metric = "ROC"
28)
29
30print(fit)

The fit$pred data frame now contains fold predictions and probabilities you can use to compute ROC curves per fold.

Compute ROC Curves for Each Fold

Each row in fit$pred includes the observed class, predicted class, fold identifier, and class probability columns. Build one ROC object per fold.

r
1pred <- fit$pred
2# Keep only rows for the best tuning setting.
3if (!is.null(fit$bestTune) && ncol(fit$bestTune) > 0) {
4  for (nm in names(fit$bestTune)) {
5    pred <- pred[pred[[nm]] == fit$bestTune[[nm]], ]
6  }
7}
8
9fold_ids <- sort(unique(pred$Resample))
10roc_points <- data.frame()
11
12for (fold in fold_ids) {
13  p <- pred[pred$Resample == fold, ]
14  roc_obj <- roc(response = p$obs, predictor = p$yes, levels = c("no", "yes"), direction = "<")
15
16  fold_df <- data.frame(
17    fpr = 1 - roc_obj$specificities,
18    tpr = roc_obj$sensitivities,
19    fold = fold,
20    auc = as.numeric(auc(roc_obj))
21  )
22
23  roc_points <- rbind(roc_points, fold_df)
24}
25
26head(roc_points)

This gives you true positive rate and false positive rate for each fold, plus fold-specific AUC.

Plot Fold ROC Curves and the Mean Pattern

You can visualize all fold curves together and optionally overlay their mean trajectory.

r
1ggplot(roc_points, aes(x = fpr, y = tpr, group = fold, color = fold)) +
2  geom_line(alpha = 0.7) +
3  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
4  coord_equal() +
5  labs(
6    title = "ROC Curve per Cross-Validation Fold",
7    x = "False Positive Rate",
8    y = "True Positive Rate"
9  ) +
10  theme_minimal()

If fold curves vary widely, the model may be sensitive to data partitioning. That can be a signal to increase sample size, simplify the model, or improve feature engineering.

Practical Interpretation Tips

Use fold ROC plots alongside fold AUC summaries. A high mean AUC with one or two weak folds can still be risky in production if your data distribution shifts. Also review confusion matrices and calibration for thresholds used by downstream systems.

For fair comparison across model types, use the same cross-validation setup and seed strategy. Consistent resampling design makes fold ROC differences meaningful.

Aggregate AUC by Fold for Quick Diagnostics

Besides plotting the curves, summarize fold AUC values in a compact table. This makes instability easy to spot during model comparison.

r
1library(dplyr)
2
3fold_auc <- roc_points %>%
4  group_by(fold) %>%
5  summarise(auc = max(auc)) %>%
6  arrange(desc(auc))
7
8print(fold_auc)
9print(mean(fold_auc$auc))
10print(sd(fold_auc$auc))

A high standard deviation means model quality depends heavily on how data is split. In that case, evaluate whether class balance, leakage, or small sample size is driving variability.

Compare Models with the Same Fold Visualization

The same process can be repeated for multiple algorithms. Use consistent resampling control and then combine curve data with a model label column for side-by-side inspection. Visual differences across folds are often easier to interpret than a single leaderboard value.

In production model selection, this helps prevent choosing a model that looks best on average but behaves inconsistently across subsets of the data.

Common Pitfalls

  • Forgetting savePredictions, which leaves no fold-level probabilities to plot.
  • Using a multiclass target directly with binary ROC code.
  • Mixing tuning rows when extracting predictions from fit$pred.
  • Interpreting one fold curve as representative of all model behavior.

Summary

  • Enable probability output and saved predictions in caret control settings.
  • Compute one ROC curve per resample using fold-specific predictions.
  • Plot all fold curves to assess variability, not only average performance.
  • Combine fold ROC analysis with AUC and threshold-based metrics.

Course illustration
Course illustration

All Rights Reserved.