Difference between cross_val_score and cross_val_predict
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
cross_val_score and cross_val_predict both use cross-validation, but they answer different questions. cross_val_score tells you how well an estimator performs across folds according to a metric, while cross_val_predict gives you out-of-fold predictions for each sample so you can inspect prediction behavior on the full dataset.
What cross_val_score Returns
cross_val_score trains the estimator repeatedly and returns one score per fold.
This is the right tool when you want to compare models, compare feature sets, or get a cross-validated estimate of performance.
The output is metrics, not predictions.
What cross_val_predict Returns
cross_val_predict also splits the data into folds, but instead of returning the score from each fold, it returns the prediction generated for each sample when that sample was in the validation fold.
Every returned prediction is out-of-fold, which means it was generated by a model that did not train on that sample.
That makes cross_val_predict useful for:
- confusion matrices
- residual analysis
- calibration plots
- stacked-model features based on out-of-fold predictions
The Two Outputs Are Not Interchangeable
A common mistake is to treat cross_val_predict as if it were just another way to estimate a global performance score. It is not exactly the same thing.
With cross_val_score, each fold gets its own metric, and those metrics are then aggregated.
With cross_val_predict, you first assemble one out-of-fold prediction per sample and then may compute a metric on the combined predictions.
For some metrics, that combined result is informative. For others, especially metrics that do not decompose cleanly over samples, it may not match the mean score you would get from cross_val_score.
Why the Predictions Come From Different Models
Another subtle point is that cross_val_predict does not give you predictions from one final fitted model. It gives you predictions from several different models, one per training fold.
That is fine for diagnostics, but it matters conceptually.
If you later fit one final model on the full dataset, its predictions may differ from the out-of-fold predictions you inspected during validation. That is expected.
A Practical Comparison
Use cross_val_score when the question is:
- which model is better
- what is the average accuracy, F1 score, or mean squared error
- how stable is the model across folds
Use cross_val_predict when the question is:
- which samples are misclassified
- what do residuals look like
- can I build a confusion matrix from out-of-fold predictions
- do I need unbiased meta-features for stacking
These are related but different workflows.
Common Pitfalls
The most common mistake is using cross_val_predict and assuming the resulting predictions come from one single trained estimator.
Another mistake is comparing a metric computed from cross_val_predict directly to the mean of cross_val_score without thinking about whether that metric decomposes sample by sample.
A third issue is forgetting that both functions refit the estimator multiple times, which can be expensive for large models.
Finally, do not use cross_val_predict when you only need one number. It is more output than necessary for pure model comparison.
Summary
- '
cross_val_scorereturns performance scores, one per fold.' - '
cross_val_predictreturns out-of-fold predictions, one per sample.' - Use
cross_val_scorefor model evaluation and comparison. - Use
cross_val_predictfor diagnostics, residuals, confusion matrices, and stacking. - The predictions from
cross_val_predictcome from multiple fold-specific models, not one final model. - Metrics derived from combined out-of-fold predictions do not always mean the same thing as fold-wise cross-validation scores.

