Cannot understand with sklearn's PolynomialFeatures
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
PolynomialFeatures in scikit-learn does not train a nonlinear model by itself. It expands your input columns into extra derived columns, and then a linear model can fit curved relationships because the feature space is richer.
What PolynomialFeatures actually does
Suppose your input has two features, x1 and x2. With degree 2, PolynomialFeatures can generate columns such as:
- '
1' - '
x1' - '
x2' - '
x1^2' - '
x1 * x2' - '
x2^2'
That is the core idea. The transformer creates basis terms. A later estimator such as linear regression learns weights on those terms.
This is why polynomial regression is still often implemented with a linear model:
A typical output looks like:
Once you see the transformed matrix, the class becomes much less mysterious.
Why this helps linear regression fit curves
A linear regression model predicts:
y = w0 + w1*f1 + w2*f2 + ...
If one of the features is x^2, the model is still linear in the weights, but nonlinear in the original variable x. That is the trick.
Example with one input feature:
Here the pipeline creates x and x^2, then the regression model fits coefficients for both.
Important parameters
PolynomialFeatures has a few parameters that change output significantly.
degree
Controls the highest polynomial power included.
Higher degree means more flexibility, but also many more columns.
include_bias
If True, the transformer adds a constant column of ones. That is useful in some math derivations, but many estimators already handle the intercept themselves.
For scikit-learn linear models, include_bias=False is often clearer.
interaction_only
If True, the transformer includes only interaction terms, not squared powers of the same feature.
For two columns, that can produce x1, x2, and x1*x2, but not x1^2 or x2^2.
Use it inside a pipeline
The safest pattern is to keep the transformation inside a pipeline. That prevents training and inference from drifting apart.
This is especially useful because polynomial expansion can create columns with very different numeric scales. Standardization often helps regularized models behave better.
Why feature count can explode
One common surprise is how quickly the output dimension grows. With many source features and a moderate degree, the transformed matrix becomes huge.
For example:
- 5 original features with degree 2 is manageable
- 50 original features with degree 3 can become very large
This is why PolynomialFeatures is often best for small or moderate feature sets where you have a reason to believe nonlinear terms matter.
Common Pitfalls
The most common mistake is thinking PolynomialFeatures is itself a model. It is only a transformer. Another frequent problem is choosing a high degree without checking how many columns are created, which can cause overfitting and slow training. Developers also forget that polynomial expansion changes feature scale and then wonder why optimization becomes unstable. Using it outside a pipeline is another source of bugs, because training data may be transformed differently from prediction data. Finally, people often inspect only coefficients from the final linear model without first mapping them back to the generated feature names, which makes interpretation unnecessarily confusing.
Summary
- '
PolynomialFeaturesexpands input columns into polynomial and interaction terms.' - It does not fit a model by itself.
- A linear model becomes nonlinear in the original variables after this expansion.
- '
degree,include_bias, andinteraction_onlyare the main behavior controls.' - Use it in a pipeline so transformation and prediction stay consistent.
- Be careful with feature explosion and overfitting as degree and input dimension grow.

