Machine Learning
ROC Curves
Multiclass Classification
R Programming
Data Analysis

ROC curves for multiclass classification in R

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

ROC curves are easiest to understand in binary classification, where a model predicts one class versus the other. In multiclass problems, the usual approach in R is to evaluate each class as one-vs-rest, then optionally compute a combined multiclass AUC for a higher-level summary.

What Changes in the Multiclass Case

A standard ROC curve needs a binary outcome and a numeric score. With three or more classes, you no longer have a single positive class. Instead, you usually convert the task into several binary comparisons.

For a three-class model with classes setosa, versicolor, and virginica, you build:

  • 'setosa versus all other rows'
  • 'versicolor versus all other rows'
  • 'virginica versus all other rows'

Each comparison has its own ROC curve and AUC. That gives you a practical picture of how well the model separates each class from the rest of the dataset.

Build Per-Class ROC Curves with pROC

The example below trains a simple multinomial model on the iris dataset, predicts class probabilities, and computes one ROC curve per class.

r
1set.seed(42)
2
3library(nnet)
4library(pROC)
5
6index <- sample(seq_len(nrow(iris)), size = 100)
7train <- iris[index, ]
8test <- iris[-index, ]
9
10model <- multinom(Species ~ ., data = train, trace = FALSE)
11probs <- predict(model, newdata = test, type = "probs")
12truth <- test$Species
13
14classes <- colnames(probs)
15
16roc_list <- lapply(classes, function(cls) {
17  roc(response = truth == cls, predictor = probs[, cls], quiet = TRUE)
18})
19
20names(roc_list) <- classes
21
22auc_values <- sapply(roc_list, auc)
23print(auc_values)

The key detail is truth == cls. That converts the multiclass response into a binary vector for one class at a time. The predictor is the model probability for that same class.

You can plot one of the curves directly:

r
plot(roc_list[["setosa"]], col = "steelblue", main = "Setosa vs Rest")

Or draw all of them on one chart:

r
1plot(roc_list[[1]], col = "steelblue", main = "One-vs-Rest ROC Curves")
2plot(roc_list[[2]], col = "darkgreen", add = TRUE)
3plot(roc_list[[3]], col = "firebrick", add = TRUE)
4
5legend(
6  "bottomright",
7  legend = paste(names(auc_values), "AUC =", round(auc_values, 3)),
8  col = c("steelblue", "darkgreen", "firebrick"),
9  lwd = 2
10)

That view is often more informative than a single aggregate score, because one class may be much harder than the others.

Compute a Multiclass AUC Summary

If you want a single summary statistic, pROC::multiclass.roc() computes a multiclass AUC based on pairwise comparisons.

r
multiclass_result <- multiclass.roc(response = truth, predictor = probs)
print(multiclass_result$auc)

This number is useful for reporting, but it does not replace the per-class curves. A strong aggregate value can hide the fact that one minority class performs poorly.

In practice, a good evaluation workflow is:

  • inspect the one-vs-rest ROC curve for each class
  • compare the individual AUC values
  • use multiclass AUC only as a compact summary

Interpreting the Curves

An ROC curve that rises quickly toward the top-left corner indicates good class separation. If a curve stays close to the diagonal, the model is not ranking that class much better than chance.

Keep in mind that ROC curves measure ranking quality across thresholds. They do not tell you whether the default threshold is ideal for your business rule. If the cost of one type of mistake is much higher than another, you still need to choose a decision threshold deliberately.

For imbalanced datasets, ROC is useful but incomplete. Precision-recall curves are often more revealing when the positive class is rare, because they focus more directly on false positives.

Common Pitfalls

The most common mistake is passing predicted class labels instead of class probabilities. ROC analysis needs a continuous score, not the final winner for each row.

Another issue is mismatched column names in the probability matrix. The columns should align with the factor levels of the response. If they do not, the per-class AUC values can be meaningless.

Some developers also expect multiclass.roc() to produce a single plot like the binary roc() function. It computes a summary AUC, but it does not replace the explicit one-vs-rest curves you usually need for visualization.

Finally, do not over-interpret tiny AUC differences on a small test set. Multiclass evaluation can be noisy when you have limited data, especially for underrepresented classes.

Summary

  • Multiclass ROC analysis is usually done as one-vs-rest for each class.
  • 'pROC::roc() works with a binary response and class probabilities.'
  • 'pROC::multiclass.roc() gives a useful overall AUC summary.'
  • Plot per-class curves to see where the model is strong or weak.
  • Use probabilities, aligned class names, and a realistic test set for reliable results.

Course illustration
Course illustration

All Rights Reserved.