R Programming
caret Package
Machine Learning
Prediction
Model Evaluation

Custom Performance Function in caret Package using predicted Probability

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In caret, a custom performance function lets you tune a model against the metric that actually matters for your problem. If that metric depends on predicted probabilities rather than hard class labels, the key is to enable class probabilities and write a summaryFunction that reads the probability columns passed into resampling.

Why Probability-Based Metrics Matter

Accuracy depends on a thresholded class prediction, but many real problems care about ranking quality or calibration instead. Metrics such as log loss, AUC, lift, or a custom business score all need predicted probabilities.

That is especially useful when:

  • the classes are imbalanced
  • false positives and false negatives have different costs
  • you care about calibration, not just the final label
  • a downstream workflow chooses its own threshold later

In those cases, a probability-based summary is usually more informative than plain accuracy.

What caret Passes Into summaryFunction

When train() evaluates a model, caret builds a data frame for the summary function. For classification, it typically includes:

  • 'obs for the observed class'
  • 'pred for the predicted class'
  • one column per class containing predicted probabilities

To make those probability columns appear, trainControl() must set classProbs = TRUE.

Example: Custom Log Loss

The example below trains a binary classifier on a filtered version of iris and uses log loss as the tuning metric.

r
1library(caret)
2
3set.seed(42)
4
5binary_iris <- subset(iris, Species != "setosa")
6binary_iris$Species <- factor(
7  binary_iris$Species,
8  levels = c("versicolor", "virginica")
9)
10
11logLossSummary <- function(data, lev = NULL, model = NULL) {
12  probs <- pmin(pmax(data[[lev[2]]], 1e-15), 1 - 1e-15)
13  actual <- ifelse(data$obs == lev[2], 1, 0)
14
15  log_loss <- -mean(actual * log(probs) + (1 - actual) * log(1 - probs))
16  c(LogLoss = log_loss)
17}
18
19ctrl <- trainControl(
20  method = "cv",
21  number = 5,
22  classProbs = TRUE,
23  summaryFunction = logLossSummary
24)
25
26fit <- train(
27  Species ~ .,
28  data = binary_iris,
29  method = "glm",
30  family = binomial(),
31  trControl = ctrl,
32  metric = "LogLoss",
33  maximize = FALSE
34)
35
36print(fit)

Two details matter here. First, metric must match the name returned by the summary function. Second, log loss should be minimized, so maximize = FALSE.

Understanding the Probability Column

Inside the summary function, data[[lev[2]]] refers to the predicted probability for the second factor level. That is why the factor level order matters. If you reorder the levels, you change which class probability is being evaluated.

This catches many people the first time they write a custom summary. The code may run, but the metric may describe the wrong class.

Adding More Than One Metric

A custom summary function can return several named metrics at once. caret will record all of them, and you can still choose one as the optimization target.

r
1multiMetricSummary <- function(data, lev = NULL, model = NULL) {
2  probs <- pmin(pmax(data[[lev[2]]], 1e-15), 1 - 1e-15)
3  actual <- ifelse(data$obs == lev[2], 1, 0)
4
5  log_loss <- -mean(actual * log(probs) + (1 - actual) * log(1 - probs))
6  brier <- mean((probs - actual)^2)
7
8  c(LogLoss = log_loss, Brier = brier)
9}

This is a good pattern when you want one primary tuning target but still want other diagnostic metrics from the same resampling run.

Common Pitfalls

Forgetting classProbs = TRUE means the probability columns will not exist, so a probability-based summary function will fail.

Returning a metric name that does not match the metric argument in train() causes caret to optimize the wrong thing or throw an error.

Ignoring factor level order can make the summary function read the wrong class probability.

Using probabilities exactly equal to 0 or 1 in log loss can create numerical issues, so clipping with a small epsilon is a practical safeguard.

Assuming accuracy-oriented summaries and probability-oriented summaries are interchangeable leads to misleading model comparisons.

Summary

  • In caret, probability-based custom metrics belong in a summaryFunction.
  • Set classProbs = TRUE so the resampling data includes per-class probabilities.
  • Return a named numeric vector, and make metric match the chosen name.
  • Be careful about factor level order when reading the positive-class probability.
  • Use probability-based metrics when ranking quality or calibration matters more than raw accuracy.

Course illustration
Course illustration

All Rights Reserved.