How to normalize the Train and Test data using MinMaxScaler sklearn
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When you use MinMaxScaler, the important rule is simple: fit on the training data only, then apply that same fitted scaler to both train and test data. If you fit separately on the test set, you leak information and distort the evaluation.
What MinMaxScaler Does
MinMaxScaler rescales each feature into a chosen range, usually 0 to 1. It does this by learning the minimum and maximum value of each feature from the training set.
A basic example:
After fitting, the scaler stores the training-set statistics and uses them later during transform.
Correct Train and Test Workflow
The standard pattern is:
- split the data
- fit the scaler on
X_train - transform both
X_trainandX_test
Notice that fit_transform is used only on X_train. The test data gets only transform.
Why Fitting on Test Data Is Wrong
If you call fit_transform on the test set as well, the test set gets scaled using its own minimum and maximum values. That leaks information from the test distribution into preprocessing and makes the evaluation less realistic.
The model should see test data processed the same way new unseen production data would be processed: using training-time statistics only.
Use a Pipeline When Possible
The safest way to avoid mistakes is to use a pipeline:
A pipeline ensures that the scaler is fitted only on training folds during cross-validation and only on the training split during a normal fit.
That matters a lot once your workflow grows beyond one simple split, because leakage bugs become much harder to spot by inspection alone.
Inverse Transform Can Restore Original Scale
If you need to convert scaled values back to their original units, use inverse_transform:
This is useful for debugging and for interpreting predictions in the original feature space.
It is also helpful when your model predicts scaled numeric targets and you need to present results back in human-readable units.
Common Pitfalls
One common mistake is scaling the full dataset before the train-test split. That leaks information from the future test set into training.
Another issue is fitting separate scalers to train and test data. That makes the feature spaces inconsistent.
It is also easy to forget that values in the test set can fall outside the training-set range. In that case, the transformed test values can be less than 0 or greater than 1, and that is expected behavior.
Summary
- Fit
MinMaxScaleron the training data only. - Use the fitted scaler to transform both training and test features.
- Do not call
fiton the test set or on the full dataset before splitting. - Pipelines are the safest way to prevent preprocessing leakage.
- Test values outside the training range can scale outside
0to1, and that is normal.

