Additional metrics in caret - PPV, sensitivity, specificity

caret

PPV

sensitivity

specificity

metrics

Additional metrics in caret - PPV, sensitivity, specificity

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

When training classifiers with caret, default metrics may hide important tradeoffs for imbalanced classes. Sensitivity, specificity, and positive predictive value are often more useful than accuracy when business cost is asymmetric. A robust evaluation setup should compute the metrics that match real decision risk, not only what is easiest to display.

Core Sections

1. Why these metrics matter

For binary classification:

Sensitivity measures true positive detection rate.
Specificity measures true negative rejection rate.
PPV measures precision among predicted positives.

If false negatives are expensive, optimize sensitivity. If false positives are expensive, optimize PPV or specificity depending on workflow.

2. Custom summary function in caret

Define a summary function and pass it to trainControl. Caret will compute these metrics during resampling and use your selected metric for model tuning.

1library(caret)
2
3customSummary <- function(data, lev = NULL, model = NULL) {
4  cm <- confusionMatrix(data$pred, data$obs, positive = lev[1])
5  c(
6    Sensitivity = unname(cm$byClass["Sensitivity"]),
7    Specificity = unname(cm$byClass["Specificity"]),
8    PPV = unname(cm$byClass["Pos Pred Value"])
9  )
10}
11
12ctrl <- trainControl(
13  method = "cv",
14  number = 5,
15  classProbs = TRUE,
16  summaryFunction = customSummary
17)

3. Train with an explicit optimization target

Choose metric based on business objective. For example, if catching positives is the priority, optimize sensitivity.

1set.seed(42)
2fit <- train(
3  Class ~ .,
4  data = training_df,
5  method = "rf",
6  trControl = ctrl,
7  metric = "Sensitivity"
8)
9print(fit)

For production deployment, also evaluate threshold effects on a holdout set since PPV and sensitivity can shift materially when threshold changes.

4. Reporting and governance

Avoid reporting a single metric in isolation. Present sensitivity, specificity, PPV, prevalence, and confusion matrix together so stakeholders can see tradeoffs clearly. This is especially important in fraud, medical, and safety contexts where class imbalance is large.

Tracking metric drift over time is also important. A model can keep similar accuracy while sensitivity degrades due to changing class balance.

5. Threshold tuning on a holdout set

Caret model training selects hyperparameters, but operational precision and recall often depend on probability threshold. Evaluate multiple thresholds on a holdout set and choose one aligned with business cost.

1probs <- predict(fit, newdata = holdout_df, type = "prob")[, "yes"]
2obs <- holdout_df$Class
3
4score_at <- function(threshold) {
5  pred <- factor(ifelse(probs >= threshold, "yes", "no"), levels = levels(obs))
6  cm <- confusionMatrix(pred, obs, positive = "yes")
7  c(
8    threshold = threshold,
9    sensitivity = unname(cm$byClass["Sensitivity"]),
10    specificity = unname(cm$byClass["Specificity"]),
11    ppv = unname(cm$byClass["Pos Pred Value"])
12  )
13}
14
15results <- rbind(score_at(0.30), score_at(0.50), score_at(0.70))
16print(results)

This simple table makes tradeoffs visible and helps teams avoid deploying a threshold that looks good on accuracy but fails real operating targets.

6. Use prevalence-aware communication

PPV can drop sharply when prevalence changes, even if model ranking quality is stable. Include expected prevalence scenarios in reports so consumers understand how field performance may shift after launch. This is useful for quarterly reviews where stakeholder teams compare model behavior across regions or seasons.

Common Pitfalls

Optimizing accuracy on imbalanced data and missing critical class-level failures.
Not setting the positive class explicitly in confusion matrix calculations.
Comparing PPV across datasets with very different class prevalence without context.
Using cross-validation metrics only and skipping holdout threshold analysis.
Reporting metrics without business cost interpretation.

Summary

Sensitivity, specificity, and PPV provide clearer insight than accuracy alone in many classification tasks.
Caret supports these metrics through custom summary functions in trainControl.
Optimize the metric that reflects business risk, not the default metric.
Evaluate model thresholds and holdout behavior before deployment.
Report metric tradeoffs together so decisions stay transparent and defensible.