RepeatedStratifiedKFold
StratifiedKFold
sklearn
machine learning
cross-validation

What is the difference between RepeatedStratifiedKFold and StratifiedKFold in sklearn?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

StratifiedKFold and RepeatedStratifiedKFold both solve the same core problem: evaluating a classifier while preserving class balance in every fold. The difference is that one produces a single stratified split plan, while the other repeats that process multiple times with different shuffles so your estimate is less dependent on one lucky or unlucky partition.

What StratifiedKFold does

Use StratifiedKFold when you want standard k-fold cross-validation for a classification task. It keeps the ratio of labels in each fold close to the ratio in the full dataset. That matters when the target is imbalanced, because plain KFold can accidentally create folds with very few positive examples.

If you set n_splits=5, the estimator is trained and evaluated five times. Each sample appears in the validation set once. When shuffle=False, the split is deterministic and based on input order. When shuffle=True, the data is shuffled before folds are created, which is usually safer unless ordering is already random.

python
1from sklearn.datasets import make_classification
2from sklearn.linear_model import LogisticRegression
3from sklearn.model_selection import StratifiedKFold, cross_val_score
4
5X, y = make_classification(
6    n_samples=1000,
7    n_features=20,
8    n_informative=5,
9    weights=[0.9, 0.1],
10    random_state=42,
11)
12
13cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
14model = LogisticRegression(max_iter=1000)
15
16scores = cross_val_score(model, X, y, cv=cv, scoring="roc_auc")
17print(scores)
18print(scores.mean())

This gives you one set of five validation scores. It is efficient and easy to reason about, so it is often the default choice for model selection and baseline experiments.

What RepeatedStratifiedKFold adds

RepeatedStratifiedKFold runs stratified k-fold more than once. Each repetition reshuffles the dataset and creates a new set of folds, so the model is evaluated across more train and validation combinations.

That matters because a single cross-validation run can still be noisy. If your dataset is small, borderline imbalanced, or sensitive to sampling, the average from one five-fold split can move around more than you would like. Repeating the process reduces the chance that your conclusion depends on one particular split.

python
1from sklearn.datasets import make_classification
2from sklearn.linear_model import LogisticRegression
3from sklearn.model_selection import RepeatedStratifiedKFold, cross_val_score
4
5X, y = make_classification(
6    n_samples=1000,
7    n_features=20,
8    n_informative=5,
9    weights=[0.9, 0.1],
10    random_state=42,
11)
12
13cv = RepeatedStratifiedKFold(
14    n_splits=5,
15    n_repeats=3,
16    random_state=42,
17)
18model = LogisticRegression(max_iter=1000)
19
20scores = cross_val_score(model, X, y, cv=cv, scoring="roc_auc")
21print(len(scores))  # 15 scores
22print(scores.mean())
23print(scores.std())

With five folds repeated three times, you get fifteen evaluation scores instead of five. The mean is often more stable, and the standard deviation gives a better sense of score variability.

When to use each one

Choose StratifiedKFold when:

  • the dataset is large enough that one cross-validation run is already stable
  • training is expensive and you want to keep evaluation time under control
  • you need a simple, reproducible benchmark

Choose RepeatedStratifiedKFold when:

  • the dataset is small or moderately noisy
  • class imbalance makes fold composition more sensitive
  • you want a more reliable comparison between similar models

The trade-off is straightforward: repeated evaluation gives a better estimate, but it multiplies training cost. A model that takes five minutes to evaluate with five folds will take roughly fifteen minutes with three repeats.

Interpreting the results

A common mistake is to assume repeated cross-validation produces a fundamentally different metric. It does not. You are still measuring the same thing, but with more resampling. Think of it as spending more compute to reduce sampling noise.

For example, if two models differ by only a tiny amount, a single StratifiedKFold run may not be enough to trust the ranking. Repeating the folds gives you more evidence that the observed difference is consistent.

python
1from sklearn.ensemble import RandomForestClassifier
2from sklearn.model_selection import RepeatedStratifiedKFold, cross_val_score
3
4cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=5, random_state=7)
5
6models = {
7    "small_forest": RandomForestClassifier(n_estimators=50, random_state=7),
8    "large_forest": RandomForestClassifier(n_estimators=200, random_state=7),
9}
10
11for name, model in models.items():
12    scores = cross_val_score(model, X, y, cv=cv, scoring="f1")
13    print(f"{name}: mean={scores.mean():.3f}, std={scores.std():.3f}")

This pattern is useful when you want to compare several classifiers under the same repeated split strategy.

Common Pitfalls

  • Using these splitters for regression tasks. Stratification is designed for classification labels, not continuous targets.
  • Forgetting shuffle=True with StratifiedKFold when input rows are ordered by class or time. Ordered data can distort the folds.
  • Comparing models with different random seeds or different splitters. Use the same cross-validation object for a fair comparison.
  • Treating repeated cross-validation as free. It improves stability, but it can become expensive for large datasets or heavy models.
  • Relying only on mean score. The spread of scores matters, especially when model performance is close.

Summary

  • 'StratifiedKFold creates one stratified set of k folds for classification.'
  • 'RepeatedStratifiedKFold repeats stratified k-fold multiple times with different shuffles.'
  • Repetition usually gives a more stable estimate of model performance.
  • The benefit of repetition comes with a proportional increase in compute cost.
  • For fast baseline work, StratifiedKFold is often enough.
  • For noisy or small datasets, RepeatedStratifiedKFold is often the safer choice.

Course illustration
Course illustration

All Rights Reserved.