MinMaxScaler
data normalization
machine learning
feature scaling
sklearn

Can someone explain to me how MinMaxScaler works?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

MinMaxScaler rescales each feature so its values fall into a chosen range, usually 0 to 1. It is simple, fast, and often useful for models that are sensitive to feature magnitude, but it does not make data normal or robust to outliers, so it is important to understand exactly what it is doing.

The Core Formula

For each feature column, MinMaxScaler computes the minimum and maximum on the training data. Then it maps every value with this formula:

scaled = (x - min) / (max - min)

If you use the default range, the smallest training value becomes 0 and the largest becomes 1.

Example with a single column:

python
values = [10, 20, 30]

The minimum is 10 and the maximum is 30.

  • '10 becomes (10 - 10) / (30 - 10) = 0.0'
  • '20 becomes (20 - 10) / (30 - 10) = 0.5'
  • '30 becomes (30 - 10) / (30 - 10) = 1.0'

That is the whole idea. The scaler stretches or compresses values linearly between the observed minimum and maximum.

A Working scikit-learn Example

python
1import numpy as np
2from sklearn.preprocessing import MinMaxScaler
3
4X = np.array([
5    [10.0, 100.0],
6    [20.0, 200.0],
7    [30.0, 300.0],
8])
9
10scaler = MinMaxScaler()
11X_scaled = scaler.fit_transform(X)
12
13print(X_scaled)
14print("mins:", scaler.data_min_)
15print("maxs:", scaler.data_max_)

Output will be approximately:

python
[[0.  0. ]
 [0.5 0.5]
 [1.  1. ]]

Each feature is scaled independently. The first column is not compared with the second column; each gets its own minimum and maximum.

Why Scaling Helps Some Models

Distance-based and gradient-based models can behave better when inputs share a similar numeric scale. For example:

  • k-nearest neighbors uses distances directly
  • neural networks train more smoothly with comparable feature ranges
  • gradient descent methods often converge more predictably

Without scaling, a feature measured in thousands can dominate another feature measured in fractions, even if both should matter equally.

fit, transform, and fit_transform

A common source of confusion is the difference between these three methods.

  • 'fit learns the minimum and maximum from the training data.'
  • 'transform applies the learned scaling to data.'
  • 'fit_transform does both in one step.'

Typical usage:

python
1from sklearn.model_selection import train_test_split
2from sklearn.preprocessing import MinMaxScaler
3
4X_train, X_test = train_test_split(X, test_size=0.2, random_state=42)
5
6scaler = MinMaxScaler()
7X_train_scaled = scaler.fit_transform(X_train)
8X_test_scaled = scaler.transform(X_test)

The key rule is that you fit only on the training set. That prevents information from the test set from leaking into preprocessing.

Choosing a Different Output Range

You are not limited to 0 through 1. You can choose another interval such as -1 through 1.

python
scaler = MinMaxScaler(feature_range=(-1, 1))
X_scaled = scaler.fit_transform(X)

That can be useful when a model or activation function works better with data centered around zero.

What Happens With New Values

If a future data point falls outside the original training minimum or maximum, its scaled value can fall outside the requested range too. For example, if training data ranged from 10 to 30, a new value of 40 scales above 1.

That is normal. MinMaxScaler does not clamp values by default. It applies the same linear mapping learned during fit.

You can also reverse the scaling later:

python
original = scaler.inverse_transform(X_scaled)
print(original)

Common Pitfalls

The most common mistake is fitting the scaler on the full dataset before the train-test split. That leaks future information and makes evaluation less trustworthy.

Another problem is assuming MinMaxScaler handles outliers well. It does not. A single extreme value can stretch the range so much that most other observations get squeezed into a tiny interval.

Developers also sometimes believe scaling to 0 through 1 makes the data normally distributed. It does not. The shape of the distribution remains the same; only the numeric range changes.

Summary

  • 'MinMaxScaler rescales each feature independently using the training minimum and maximum.'
  • The default formula maps the smallest training value to 0 and the largest to 1.
  • Use fit on training data only, then transform both training and test data.
  • Scaling helps many distance-based and gradient-based models but does not remove outliers.
  • The method changes numeric range, not the underlying distribution shape.

Course illustration
Course illustration

All Rights Reserved.