Difference between cross_val_score and cross_val_predict

cross-validation

machine learning

scikit-learn

cross_val_score

cross_val_predict

Difference between cross_val_score and cross_val_predict

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

cross_val_score and cross_val_predict both use cross-validation, but they answer different questions. cross_val_score tells you how well an estimator performs across folds according to a metric, while cross_val_predict gives you out-of-fold predictions for each sample so you can inspect prediction behavior on the full dataset.

What `cross_val_score` Returns

cross_val_score trains the estimator repeatedly and returns one score per fold.

python

1from sklearn.datasets import load_iris
2from sklearn.linear_model import LogisticRegression
3from sklearn.model_selection import cross_val_score
4
5X, y = load_iris(return_X_y=True)
6model = LogisticRegression(max_iter=200)
7
8scores = cross_val_score(model, X, y, cv=5, scoring="accuracy")
9print(scores)
10print(scores.mean())

This is the right tool when you want to compare models, compare feature sets, or get a cross-validated estimate of performance.

The output is metrics, not predictions.

What `cross_val_predict` Returns

cross_val_predict also splits the data into folds, but instead of returning the score from each fold, it returns the prediction generated for each sample when that sample was in the validation fold.

python

1from sklearn.datasets import load_iris
2from sklearn.linear_model import LogisticRegression
3from sklearn.model_selection import cross_val_predict
4
5X, y = load_iris(return_X_y=True)
6model = LogisticRegression(max_iter=200)
7
8predictions = cross_val_predict(model, X, y, cv=5)
9print(predictions[:10])

Every returned prediction is out-of-fold, which means it was generated by a model that did not train on that sample.

That makes cross_val_predict useful for:

confusion matrices
residual analysis
calibration plots
stacked-model features based on out-of-fold predictions

The Two Outputs Are Not Interchangeable

A common mistake is to treat cross_val_predict as if it were just another way to estimate a global performance score. It is not exactly the same thing.

With cross_val_score, each fold gets its own metric, and those metrics are then aggregated.

With cross_val_predict, you first assemble one out-of-fold prediction per sample and then may compute a metric on the combined predictions.

For some metrics, that combined result is informative. For others, especially metrics that do not decompose cleanly over samples, it may not match the mean score you would get from cross_val_score.

Why the Predictions Come From Different Models

Another subtle point is that cross_val_predict does not give you predictions from one final fitted model. It gives you predictions from several different models, one per training fold.

That is fine for diagnostics, but it matters conceptually.

If you later fit one final model on the full dataset, its predictions may differ from the out-of-fold predictions you inspected during validation. That is expected.

A Practical Comparison

Use cross_val_score when the question is:

which model is better
what is the average accuracy, F1 score, or mean squared error
how stable is the model across folds

Use cross_val_predict when the question is:

which samples are misclassified
what do residuals look like
can I build a confusion matrix from out-of-fold predictions
do I need unbiased meta-features for stacking

These are related but different workflows.

Common Pitfalls

The most common mistake is using cross_val_predict and assuming the resulting predictions come from one single trained estimator.

Another mistake is comparing a metric computed from cross_val_predict directly to the mean of cross_val_score without thinking about whether that metric decomposes sample by sample.

A third issue is forgetting that both functions refit the estimator multiple times, which can be expensive for large models.

Finally, do not use cross_val_predict when you only need one number. It is more output than necessary for pure model comparison.

Summary

'cross_val_score returns performance scores, one per fold.'
'cross_val_predict returns out-of-fold predictions, one per sample.'
Use cross_val_score for model evaluation and comparison.
Use cross_val_predict for diagnostics, residuals, confusion matrices, and stacking.
The predictions from cross_val_predict come from multiple fold-specific models, not one final model.
Metrics derived from combined out-of-fold predictions do not always mean the same thing as fold-wise cross-validation scores.

Difference between cross_val_score and cross_val_predict

Master System Design with Codemia

Introduction

What cross_val_score Returns

What cross_val_predict Returns

The Two Outputs Are Not Interchangeable

Why the Predictions Come From Different Models

A Practical Comparison

Common Pitfalls

Summary

What `cross_val_score` Returns

What `cross_val_predict` Returns