sklearn
PolynomialFeatures
machine learning
data preprocessing
Python programming

Cannot understand with sklearn's PolynomialFeatures

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

PolynomialFeatures in scikit-learn does not train a nonlinear model by itself. It expands your input columns into extra derived columns, and then a linear model can fit curved relationships because the feature space is richer.

What PolynomialFeatures actually does

Suppose your input has two features, x1 and x2. With degree 2, PolynomialFeatures can generate columns such as:

  • '1'
  • 'x1'
  • 'x2'
  • 'x1^2'
  • 'x1 * x2'
  • 'x2^2'

That is the core idea. The transformer creates basis terms. A later estimator such as linear regression learns weights on those terms.

This is why polynomial regression is still often implemented with a linear model:

python
1import numpy as np
2from sklearn.preprocessing import PolynomialFeatures
3
4X = np.array([
5    [2.0, 3.0],
6    [4.0, 5.0],
7])
8
9poly = PolynomialFeatures(degree=2)
10X_poly = poly.fit_transform(X)
11
12print(X_poly)
13print(poly.get_feature_names_out(["x1", "x2"]))

A typical output looks like:

text
[[ 1.  2.  3.  4.  6.  9.]
 [ 1.  4.  5. 16. 20. 25.]]
['1' 'x1' 'x2' 'x1^2' 'x1 x2' 'x2^2']

Once you see the transformed matrix, the class becomes much less mysterious.

Why this helps linear regression fit curves

A linear regression model predicts:

y = w0 + w1*f1 + w2*f2 + ...

If one of the features is x^2, the model is still linear in the weights, but nonlinear in the original variable x. That is the trick.

Example with one input feature:

python
1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.pipeline import Pipeline
4from sklearn.preprocessing import PolynomialFeatures
5
6X = np.array([[1.0], [2.0], [3.0], [4.0], [5.0]])
7y = np.array([1.0, 4.1, 8.8, 16.2, 24.9])
8
9model = Pipeline([
10    ("poly", PolynomialFeatures(degree=2, include_bias=False)),
11    ("reg", LinearRegression()),
12])
13
14model.fit(X, y)
15pred = model.predict(np.array([[6.0]]))
16
17print(pred)

Here the pipeline creates x and x^2, then the regression model fits coefficients for both.

Important parameters

PolynomialFeatures has a few parameters that change output significantly.

degree

Controls the highest polynomial power included.

python
PolynomialFeatures(degree=3)

Higher degree means more flexibility, but also many more columns.

include_bias

If True, the transformer adds a constant column of ones. That is useful in some math derivations, but many estimators already handle the intercept themselves.

python
PolynomialFeatures(degree=2, include_bias=False)

For scikit-learn linear models, include_bias=False is often clearer.

interaction_only

If True, the transformer includes only interaction terms, not squared powers of the same feature.

python
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)

For two columns, that can produce x1, x2, and x1*x2, but not x1^2 or x2^2.

Use it inside a pipeline

The safest pattern is to keep the transformation inside a pipeline. That prevents training and inference from drifting apart.

python
1from sklearn.pipeline import Pipeline
2from sklearn.preprocessing import PolynomialFeatures, StandardScaler
3from sklearn.linear_model import Ridge
4
5model = Pipeline([
6    ("poly", PolynomialFeatures(degree=3, include_bias=False)),
7    ("scale", StandardScaler()),
8    ("ridge", Ridge(alpha=1.0)),
9])

This is especially useful because polynomial expansion can create columns with very different numeric scales. Standardization often helps regularized models behave better.

Why feature count can explode

One common surprise is how quickly the output dimension grows. With many source features and a moderate degree, the transformed matrix becomes huge.

For example:

  • 5 original features with degree 2 is manageable
  • 50 original features with degree 3 can become very large

This is why PolynomialFeatures is often best for small or moderate feature sets where you have a reason to believe nonlinear terms matter.

Common Pitfalls

The most common mistake is thinking PolynomialFeatures is itself a model. It is only a transformer. Another frequent problem is choosing a high degree without checking how many columns are created, which can cause overfitting and slow training. Developers also forget that polynomial expansion changes feature scale and then wonder why optimization becomes unstable. Using it outside a pipeline is another source of bugs, because training data may be transformed differently from prediction data. Finally, people often inspect only coefficients from the final linear model without first mapping them back to the generated feature names, which makes interpretation unnecessarily confusing.

Summary

  • 'PolynomialFeatures expands input columns into polynomial and interaction terms.'
  • It does not fit a model by itself.
  • A linear model becomes nonlinear in the original variables after this expansion.
  • 'degree, include_bias, and interaction_only are the main behavior controls.'
  • Use it in a pipeline so transformation and prediction stay consistent.
  • Be careful with feature explosion and overfitting as degree and input dimension grow.

Course illustration
Course illustration

All Rights Reserved.