SVM
scikit-learn
normalization
machine learning
data preprocessing

Right function for normalizing input of sklearn SVM

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

For scikit-learn SVMs, use StandardScaler (z-score normalization) as the default preprocessing choice. It centers features to mean=0 and standard deviation=1, which is what the RBF kernel expects. MinMaxScaler (scaling to [0,1]) is an alternative when features are already bounded or when you want a specific range. Always fit the scaler on training data only and transform both train and test sets using a Pipeline to prevent data leakage.

Why SVMs Need Normalization

SVMs compute distances between data points. Features with larger scales dominate the distance calculation, causing the SVM to ignore smaller-scale features:

python
1import numpy as np
2from sklearn.svm import SVC
3from sklearn.datasets import make_classification
4from sklearn.model_selection import train_test_split
5from sklearn.metrics import accuracy_score
6
7X, y = make_classification(n_samples=1000, n_features=2, random_state=42)
8
9# Feature 1 is in range [0, 1], Feature 2 is in range [0, 10000]
10X[:, 1] = X[:, 1] * 10000
11
12X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
13
14# Without normalization — Feature 2 dominates
15svm_raw = SVC(kernel='rbf')
16svm_raw.fit(X_train, y_train)
17print(f"Without normalization: {accuracy_score(y_test, svm_raw.predict(X_test)):.3f}")
18# Low accuracy — SVM ignores Feature 1
19
20# With normalization — both features contribute equally
21from sklearn.preprocessing import StandardScaler
22scaler = StandardScaler()
23X_train_scaled = scaler.fit_transform(X_train)
24X_test_scaled = scaler.transform(X_test)
25
26svm_scaled = SVC(kernel='rbf')
27svm_scaled.fit(X_train_scaled, y_train)
28print(f"With normalization: {accuracy_score(y_test, svm_scaled.predict(X_test_scaled)):.3f}")
29# Higher accuracy

StandardScaler (Z-Score Normalization)

Transforms each feature to have mean=0 and standard deviation=1:

python
1from sklearn.preprocessing import StandardScaler
2
3scaler = StandardScaler()
4
5# Fit on training data only
6X_train_scaled = scaler.fit_transform(X_train)
7X_test_scaled = scaler.transform(X_test)  # Use same mean/std from training
8
9print(f"Train mean: {X_train_scaled.mean(axis=0)}")   # [~0, ~0]
10print(f"Train std:  {X_train_scaled.std(axis=0)}")     # [~1, ~1]

Formula: z = (x - mean) / std

Best for: RBF kernel SVM, data with outliers, features with different units.

MinMaxScaler (Min-Max Scaling)

Scales each feature to a fixed range, typically [0, 1]:

python
1from sklearn.preprocessing import MinMaxScaler
2
3scaler = MinMaxScaler(feature_range=(0, 1))
4
5X_train_scaled = scaler.fit_transform(X_train)
6X_test_scaled = scaler.transform(X_test)
7
8print(f"Train min: {X_train_scaled.min(axis=0)}")  # [0, 0]
9print(f"Train max: {X_train_scaled.max(axis=0)}")  # [1, 1]

Formula: x_scaled = (x - min) / (max - min)

Best for: Features already bounded (pixel values 0-255, probabilities 0-1), linear kernel SVM.

Wrap scaling and SVM in a Pipeline to prevent data leakage and simplify code:

python
1from sklearn.pipeline import Pipeline
2from sklearn.preprocessing import StandardScaler
3from sklearn.svm import SVC
4from sklearn.model_selection import cross_val_score
5
6pipe = Pipeline([
7    ('scaler', StandardScaler()),
8    ('svm', SVC(kernel='rbf', C=1.0, gamma='scale'))
9])
10
11# Cross-validation automatically fits scaler on each fold's training set
12scores = cross_val_score(pipe, X, y, cv=5)
13print(f"CV Accuracy: {scores.mean():.3f} ± {scores.std():.3f}")
14
15# Fit and predict
16pipe.fit(X_train, y_train)
17predictions = pipe.predict(X_test)

Pipeline with GridSearchCV

python
1from sklearn.model_selection import GridSearchCV
2
3pipe = Pipeline([
4    ('scaler', StandardScaler()),
5    ('svm', SVC())
6])
7
8param_grid = {
9    'svm__C': [0.1, 1, 10, 100],
10    'svm__kernel': ['rbf', 'linear'],
11    'svm__gamma': ['scale', 'auto', 0.01, 0.1]
12}
13
14grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy')
15grid.fit(X_train, y_train)
16
17print(f"Best params: {grid.best_params_}")
18print(f"Best score: {grid.best_score_:.3f}")

Other Scalers

python
1from sklearn.preprocessing import (
2    RobustScaler,
3    MaxAbsScaler,
4    Normalizer
5)
6
7# RobustScaler — uses median and IQR, resistant to outliers
8robust = RobustScaler()
9X_robust = robust.fit_transform(X_train)
10
11# MaxAbsScaler — scales to [-1, 1] by dividing by max absolute value
12# Good for sparse data (does not shift center)
13maxabs = MaxAbsScaler()
14X_maxabs = maxabs.fit_transform(X_train)
15
16# Normalizer — scales each SAMPLE (row) to unit norm
17# Different from others which scale each FEATURE (column)
18normalizer = Normalizer(norm='l2')
19X_normalized = normalizer.fit_transform(X_train)
ScalerFormulaBest For
StandardScaler(x - mean) / stdRBF kernel, general use
MinMaxScaler(x - min) / (max - min)Bounded features, linear kernel
RobustScaler(x - median) / IQRData with outliers
MaxAbsScalerx / max(abs(x))Sparse data
Normalizerx / norm(x)Text classification, cosine similarity

Common Pitfalls

  • Fitting the scaler on the entire dataset before splitting: Calling scaler.fit_transform(X) before train_test_split leaks test set statistics into the training process. Always fit on training data only and transform the test set separately. Using a Pipeline prevents this automatically.
  • Choosing MinMaxScaler for data with outliers: Outliers compress the majority of values into a narrow range because MinMaxScaler uses the actual min and max. Use RobustScaler or StandardScaler when outliers are present.
  • Not scaling test data with the same scaler instance: Calling scaler.fit_transform(X_test) recomputes the mean and std from the test set, breaking the consistency. Use scaler.transform(X_test) (without fit) to apply the training set's parameters.
  • Using Normalizer instead of StandardScaler: Normalizer scales each row (sample) to unit length, not each column (feature). For SVM feature normalization, you want column-wise scaling (StandardScaler or MinMaxScaler), not row-wise normalization.
  • Assuming the linear kernel does not need scaling: While the linear kernel is less sensitive to scale than RBF, unscaled features still affect regularization (the C parameter). Scale your features regardless of kernel choice for consistent, reproducible results.

Summary

  • Use StandardScaler as the default for SVM preprocessing — it centers features to mean=0, std=1
  • Use MinMaxScaler when features are naturally bounded and there are no outliers
  • Always wrap scaling and SVM in a Pipeline to prevent data leakage
  • Fit the scaler on training data only; use transform (not fit_transform) on test data
  • The RBF kernel's gamma parameter is sensitive to feature scale — scaling is essential for good performance

Course illustration
Course illustration

All Rights Reserved.