tensorflow
sklearn
PolynomialFeatures
machine learning
feature engineering

How to implement sklearn's PolynomialFeatures in tensorflow?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Scikit-learn's PolynomialFeatures transformer expands each input row by adding powers and interaction terms. TensorFlow does not provide the same transformer out of the box, but you can reproduce the behavior with a small amount of tensor code or package it in a custom Keras layer.

What the Expansion Actually Produces

Before writing TensorFlow code, it helps to be precise about what scikit-learn does. For two input features, x1 and x2, a degree-2 expansion with a bias term usually produces:

text
1, x1, x2, x1^2, x1*x2, x2^2

Three options matter:

  • 'degree controls the highest polynomial degree'
  • 'include_bias decides whether to prepend a column of ones'
  • 'interaction_only skips powers such as x1^2 and keeps only cross-feature products'

If you want TensorFlow output to line up with a scikit-learn pipeline, those rules must match exactly.

A Direct Degree-2 Implementation

If degree 2 is all you need, the implementation is straightforward. Concatenate the original columns, the squared columns, and every pairwise product.

python
1import tensorflow as tf
2
3
4def polynomial_degree_2(x, include_bias=True):
5    x = tf.convert_to_tensor(x, dtype=tf.float32)
6    outputs = []
7
8    if include_bias:
9        outputs.append(tf.ones((tf.shape(x)[0], 1), dtype=x.dtype))
10
11    outputs.append(x)
12    outputs.append(tf.square(x))
13
14    num_features = x.shape[-1]
15    interaction_terms = []
16
17    for i in range(num_features):
18        for j in range(i + 1, num_features):
19            interaction_terms.append(x[:, i:i + 1] * x[:, j:j + 1])
20
21    if interaction_terms:
22        outputs.append(tf.concat(interaction_terms, axis=1))
23
24    return tf.concat(outputs, axis=1)
25
26
27sample = tf.constant([[2.0, 3.0], [4.0, 5.0]])
28print(polynomial_degree_2(sample).numpy())

This is enough for many tabular models. It is readable, easy to test, and avoids unnecessary complexity when the degree is fixed.

Build a More General Version

To mirror scikit-learn more closely, generate combinations of feature indexes for every degree from 1 to degree. The standard library already gives you the combination logic you need.

python
1import itertools
2import tensorflow as tf
3
4
5def polynomial_features(x, degree=2, include_bias=True, interaction_only=False):
6    x = tf.convert_to_tensor(x, dtype=tf.float32)
7    num_features = x.shape[-1]
8    outputs = []
9
10    if include_bias:
11        outputs.append(tf.ones((tf.shape(x)[0], 1), dtype=x.dtype))
12
13    combo_fn = itertools.combinations if interaction_only else itertools.combinations_with_replacement
14
15    for current_degree in range(1, degree + 1):
16        for combo in combo_fn(range(num_features), current_degree):
17            term = tf.ones((tf.shape(x)[0], 1), dtype=x.dtype)
18            for feature_index in combo:
19                term = term * x[:, feature_index:feature_index + 1]
20            outputs.append(term)
21
22    return tf.concat(outputs, axis=1)
23
24
25sample = tf.constant([[2.0, 3.0, 4.0]])
26expanded = polynomial_features(sample, degree=2, include_bias=True, interaction_only=False)
27print(expanded.numpy())

This version is conceptually very close to the transformer in scikit-learn. It is also a good reference implementation for tests, even if you later replace it with a more optimized version.

Put It Inside a Keras Layer

If the feature expansion should live inside your TensorFlow model, wrap the logic in a custom layer. That makes the preprocessing part of the graph and keeps training and serving behavior together.

python
1import itertools
2import tensorflow as tf
3
4
5class PolynomialFeaturesLayer(tf.keras.layers.Layer):
6    def __init__(self, degree=2, include_bias=True, interaction_only=False, **kwargs):
7        super().__init__(**kwargs)
8        self.degree = degree
9        self.include_bias = include_bias
10        self.interaction_only = interaction_only
11
12    def call(self, inputs):
13        num_features = inputs.shape[-1]
14        combo_fn = itertools.combinations if self.interaction_only else itertools.combinations_with_replacement
15        outputs = []
16
17        if self.include_bias:
18            outputs.append(tf.ones((tf.shape(inputs)[0], 1), dtype=inputs.dtype))
19
20        for current_degree in range(1, self.degree + 1):
21            for combo in combo_fn(range(num_features), current_degree):
22                term = tf.ones((tf.shape(inputs)[0], 1), dtype=inputs.dtype)
23                for feature_index in combo:
24                    term = term * inputs[:, feature_index:feature_index + 1]
25                outputs.append(term)
26
27        return tf.concat(outputs, axis=1)
28
29
30inputs = tf.keras.Input(shape=(2,))
31x = PolynomialFeaturesLayer(degree=2)(inputs)
32outputs = tf.keras.layers.Dense(1)(x)
33model = tf.keras.Model(inputs=inputs, outputs=outputs)
34
35print(model(tf.constant([[1.0, 2.0]])).numpy())

This pattern is useful when you want exported models to contain the same preprocessing logic instead of relying on a separate Python preprocessing step outside the model.

Verify Against scikit-learn

The multiplication logic is easy. Matching behavior exactly is the harder part. Feature ordering, inclusion of the bias term, and the handling of interaction_only all affect the final matrix. If the TensorFlow ordering differs from scikit-learn, a model trained with one representation will not behave correctly with the other.

A good development habit is to compare a few sample rows against scikit-learn before using the TensorFlow version in training or serving.

python
1import numpy as np
2from sklearn.preprocessing import PolynomialFeatures
3
4x = np.array([[2.0, 3.0]])
5sk = PolynomialFeatures(degree=2, include_bias=True).fit_transform(x)
6print(sk)

If the matrices match on test input, you can be much more confident that the port is correct.

Be Careful with Feature Explosion

Polynomial feature generation grows very quickly. More base features and higher degree mean a much larger derived feature matrix. That increases memory use, training time, and the chance of overfitting. Even a correct implementation can be the wrong engineering decision if it expands the data beyond what your model and hardware can handle.

In deep learning, explicit polynomial expansion is often unnecessary because the network can already learn nonlinear interactions. It is most useful when you are porting a classical ML pipeline, building a linear or shallow model, or intentionally controlling the feature space.

Common Pitfalls

One common mistake is implementing only squared terms and forgetting interaction terms. That does not match PolynomialFeatures.

Another pitfall is forgetting the bias column. Scikit-learn includes it by default, so output comparisons will look wrong if include_bias does not match.

A third problem is relying on inputs.shape[-1] when the feature count is unknown. This pattern assumes a fixed tabular feature dimension, which is the common case for Keras models.

Finally, feature explosion is easy to underestimate. Test the output dimension before wiring the expansion into a production pipeline.

Summary

  • 'PolynomialFeatures adds powers and interaction terms, not just squared columns.'
  • A hand-written TensorFlow function is enough for a practical degree-2 expansion.
  • A combination-based implementation is the clearest way to mirror scikit-learn behavior.
  • Wrapping the logic in a custom Keras layer keeps preprocessing inside the model graph.
  • Compare TensorFlow output with scikit-learn on sample data before using the implementation in production.

Course illustration
Course illustration