scikit-learn
hyperparameter tuning
machine learning
model optimization
grid search

Compare ways to tune hyperparameters in scikit-learn

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Hyperparameter tuning is the process of searching for model settings that improve validation performance. In scikit-learn, different search strategies trade accuracy, compute cost, and reproducibility in different ways. Choosing the right strategy depends on parameter space size, model training cost, and deployment constraints.

Grid Search for Small, Structured Spaces

GridSearchCV evaluates every combination in a parameter grid using cross-validation. It is deterministic and easy to explain, which makes it a strong baseline when the search space is small.

python
1from sklearn.datasets import load_breast_cancer
2from sklearn.model_selection import GridSearchCV, StratifiedKFold
3from sklearn.pipeline import Pipeline
4from sklearn.preprocessing import StandardScaler
5from sklearn.svm import SVC
6
7X, y = load_breast_cancer(return_X_y=True)
8
9pipe = Pipeline([
10    ("scaler", StandardScaler()),
11    ("svc", SVC())
12])
13
14param_grid = {
15    "svc__C": [0.1, 1, 10, 100],
16    "svc__gamma": ["scale", 0.01, 0.001],
17    "svc__kernel": ["rbf"]
18}
19
20cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
21
22search = GridSearchCV(
23    estimator=pipe,
24    param_grid=param_grid,
25    scoring="roc_auc",
26    cv=cv,
27    n_jobs=-1
28)
29
30search.fit(X, y)
31print("Best score:", round(search.best_score_, 4))
32print("Best params:", search.best_params_)

Grid search is exhaustive but can become expensive quickly as dimensions increase. Use it when you have strong priors and narrow candidate values.

Randomized Search for Larger Spaces

RandomizedSearchCV samples a fixed number of parameter combinations from distributions. It usually finds strong configurations faster when the space is large or continuous.

python
1from sklearn.datasets import load_breast_cancer
2from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold
3from sklearn.pipeline import Pipeline
4from sklearn.preprocessing import StandardScaler
5from sklearn.ensemble import RandomForestClassifier
6from scipy.stats import randint
7
8X, y = load_breast_cancer(return_X_y=True)
9
10pipe = Pipeline([
11    ("scaler", StandardScaler(with_mean=False)),
12    ("rf", RandomForestClassifier(random_state=42))
13])
14
15param_dist = {
16    "rf__n_estimators": randint(100, 700),
17    "rf__max_depth": randint(2, 20),
18    "rf__min_samples_split": randint(2, 20),
19    "rf__min_samples_leaf": randint(1, 10)
20}
21
22cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
23
24search = RandomizedSearchCV(
25    estimator=pipe,
26    param_distributions=param_dist,
27    n_iter=40,
28    scoring="roc_auc",
29    cv=cv,
30    n_jobs=-1,
31    random_state=42
32)
33
34search.fit(X, y)
35print("Best score:", round(search.best_score_, 4))
36print("Best params:", search.best_params_)

If model training is expensive, random search often gives better score per unit time than exhaustive grids.

Scikit-learn also provides halving methods that allocate small resources to many candidates, then keep only top performers. This can cut total compute substantially on larger experiments.

python
1from sklearn.experimental import enable_halving_search_cv  # noqa: F401
2from sklearn.model_selection import HalvingGridSearchCV
3from sklearn.ensemble import GradientBoostingClassifier
4from sklearn.datasets import load_breast_cancer
5
6X, y = load_breast_cancer(return_X_y=True)
7
8param_grid = {
9    "learning_rate": [0.01, 0.05, 0.1, 0.2],
10    "n_estimators": [50, 100, 200, 400],
11    "max_depth": [2, 3, 4]
12}
13
14search = HalvingGridSearchCV(
15    estimator=GradientBoostingClassifier(random_state=42),
16    param_grid=param_grid,
17    scoring="roc_auc",
18    factor=3,
19    cv=5
20)
21
22search.fit(X, y)
23print("Best score:", round(search.best_score_, 4))
24print("Best params:", search.best_params_)

Halving methods are effective when you can define a meaningful notion of increasing resource, such as number of estimators or subset size.

Practical Selection Strategy

Use this decision rule:

  • Start with random search when parameter ranges are broad or mostly continuous.
  • Use grid search for final local refinement around a promising region.
  • Use halving when you need strong results under a strict compute budget.

Always separate tuning data from final test data. If the test set influences search decisions, reported performance becomes optimistic and unreliable in production.

Track metadata for every run: random seed, scoring metric, fold strategy, model version, and feature preprocessing version. Without this, comparisons become ambiguous.

Common Pitfalls

  • Tuning on test data, which leaks information and inflates scores.
  • Searching too many irrelevant parameters at once, which wastes compute.
  • Ignoring preprocessing in the search pipeline, causing leakage or mismatch.
  • Comparing runs with different cross-validation splits without fixed seeds.
  • Selecting by one metric only when deployment needs balanced precision and recall.

Summary

  • GridSearchCV is exhaustive and best for compact, well-defined spaces.
  • RandomizedSearchCV is usually more efficient for large or continuous ranges.
  • Halving search methods improve score-per-compute under budget constraints.
  • Keep preprocessing inside pipelines to avoid leakage during cross-validation.
  • Make tuning reproducible with fixed seeds, logged metadata, and strict test isolation.

Course illustration
Course illustration

All Rights Reserved.