How to implement sklearn's PolynomialFeatures in tensorflow?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Scikit-learn's PolynomialFeatures transformer expands each input row by adding powers and interaction terms. TensorFlow does not provide the same transformer out of the box, but you can reproduce the behavior with a small amount of tensor code or package it in a custom Keras layer.
What the Expansion Actually Produces
Before writing TensorFlow code, it helps to be precise about what scikit-learn does. For two input features, x1 and x2, a degree-2 expansion with a bias term usually produces:
Three options matter:
- '
degreecontrols the highest polynomial degree' - '
include_biasdecides whether to prepend a column of ones' - '
interaction_onlyskips powers such asx1^2and keeps only cross-feature products'
If you want TensorFlow output to line up with a scikit-learn pipeline, those rules must match exactly.
A Direct Degree-2 Implementation
If degree 2 is all you need, the implementation is straightforward. Concatenate the original columns, the squared columns, and every pairwise product.
This is enough for many tabular models. It is readable, easy to test, and avoids unnecessary complexity when the degree is fixed.
Build a More General Version
To mirror scikit-learn more closely, generate combinations of feature indexes for every degree from 1 to degree. The standard library already gives you the combination logic you need.
This version is conceptually very close to the transformer in scikit-learn. It is also a good reference implementation for tests, even if you later replace it with a more optimized version.
Put It Inside a Keras Layer
If the feature expansion should live inside your TensorFlow model, wrap the logic in a custom layer. That makes the preprocessing part of the graph and keeps training and serving behavior together.
This pattern is useful when you want exported models to contain the same preprocessing logic instead of relying on a separate Python preprocessing step outside the model.
Verify Against scikit-learn
The multiplication logic is easy. Matching behavior exactly is the harder part. Feature ordering, inclusion of the bias term, and the handling of interaction_only all affect the final matrix. If the TensorFlow ordering differs from scikit-learn, a model trained with one representation will not behave correctly with the other.
A good development habit is to compare a few sample rows against scikit-learn before using the TensorFlow version in training or serving.
If the matrices match on test input, you can be much more confident that the port is correct.
Be Careful with Feature Explosion
Polynomial feature generation grows very quickly. More base features and higher degree mean a much larger derived feature matrix. That increases memory use, training time, and the chance of overfitting. Even a correct implementation can be the wrong engineering decision if it expands the data beyond what your model and hardware can handle.
In deep learning, explicit polynomial expansion is often unnecessary because the network can already learn nonlinear interactions. It is most useful when you are porting a classical ML pipeline, building a linear or shallow model, or intentionally controlling the feature space.
Common Pitfalls
One common mistake is implementing only squared terms and forgetting interaction terms. That does not match PolynomialFeatures.
Another pitfall is forgetting the bias column. Scikit-learn includes it by default, so output comparisons will look wrong if include_bias does not match.
A third problem is relying on inputs.shape[-1] when the feature count is unknown. This pattern assumes a fixed tabular feature dimension, which is the common case for Keras models.
Finally, feature explosion is easy to underestimate. Test the output dimension before wiring the expansion into a production pipeline.
Summary
- '
PolynomialFeaturesadds powers and interaction terms, not just squared columns.' - A hand-written TensorFlow function is enough for a practical degree-2 expansion.
- A combination-based implementation is the clearest way to mirror scikit-learn behavior.
- Wrapping the logic in a custom Keras layer keeps preprocessing inside the model graph.
- Compare TensorFlow output with scikit-learn on sample data before using the implementation in production.

