Lasso regression
TensorFlow
machine learning
regression analysis
data science

Implementing lasso regression using TensorFlow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Lasso regression is linear regression with an L1 penalty on the weights. That penalty encourages sparsity, which means some coefficients can shrink toward zero and effectively drop less useful features from the model.

The Objective Behind Lasso

The lasso objective is:

  • prediction loss, often mean squared error
  • plus lambda * sum(abs(weights))

The L1 term is what makes lasso different from ordinary least squares. A small penalty behaves like standard linear regression. A large penalty pushes more coefficients toward zero.

The Simplest TensorFlow Approach

In TensorFlow, the cleanest implementation is usually a single dense layer with an L1 kernel regularizer.

python
1import numpy as np
2import tensorflow as tf
3
4X = np.array([
5    [1.0, 2.0, 0.0],
6    [2.0, 1.0, 1.0],
7    [3.0, 3.0, 0.0],
8    [4.0, 5.0, 1.0],
9], dtype=np.float32)
10
11y = np.array([[5.0], [6.0], [9.0], [13.0]], dtype=np.float32)
12
13model = tf.keras.Sequential([
14    tf.keras.layers.Input(shape=(X.shape[1],)),
15    tf.keras.layers.Dense(
16        1,
17        kernel_regularizer=tf.keras.regularizers.L1(0.01)
18    )
19])
20
21model.compile(optimizer="adam", loss="mse")
22model.fit(X, y, epochs=200, verbose=0)
23
24weights, bias = model.layers[0].get_weights()
25print("weights:", weights.ravel())
26print("bias:", bias)

This is idiomatic TensorFlow because the regularization penalty is added to the training loss automatically.

Where the Regularization Loss Goes

Keras layers store regularization penalties in model.losses. During fit, TensorFlow adds those terms to the main loss automatically.

If you want to inspect them manually, you can:

python
1import tensorflow as tf
2
3layer = tf.keras.layers.Dense(
4    1,
5    kernel_regularizer=tf.keras.regularizers.L1(0.01)
6)
7
8x = tf.ones((2, 3))
9_ = layer(x)
10print(layer.losses)

That is useful for debugging and for understanding how the penalty is incorporated.

Manual Training With GradientTape

If you want full control over the objective, write the L1 term yourself.

python
1import numpy as np
2import tensorflow as tf
3
4X = tf.constant([[1.0], [2.0], [3.0], [4.0]], dtype=tf.float32)
5y = tf.constant([[2.0], [4.1], [5.9], [8.2]], dtype=tf.float32)
6
7w = tf.Variable([[0.0]], dtype=tf.float32)
8b = tf.Variable([0.0], dtype=tf.float32)
9optimizer = tf.keras.optimizers.SGD(learning_rate=0.05)
10lambda_l1 = 0.1
11
12for _ in range(200):
13    with tf.GradientTape() as tape:
14        predictions = tf.matmul(X, w) + b
15        mse = tf.reduce_mean(tf.square(predictions - y))
16        l1_penalty = lambda_l1 * tf.reduce_sum(tf.abs(w))
17        loss = mse + l1_penalty
18
19    gradients = tape.gradient(loss, [w, b])
20    optimizer.apply_gradients(zip(gradients, [w, b]))
21
22print("w =", w.numpy().ravel())
23print("b =", b.numpy())

This makes the lasso objective explicit and is useful when you want custom training logic.

Choosing the Penalty Strength

The most important hyperparameter is the L1 coefficient. Too small, and you get little sparsity. Too large, and the model underfits badly because important features are pushed too close to zero.

In practice, tune the penalty with validation data instead of guessing. Lasso is especially sensitive to feature scale, so standardizing inputs before training is usually a good idea.

When Lasso Works Best

Lasso is most useful when:

  • you suspect only a subset of features is important
  • interpretability matters
  • you want embedded feature selection in a linear model

If all features carry small amounts of signal, ridge regression or elastic net may behave better than pure lasso.

Common Pitfalls

The biggest pitfall is forgetting to scale features. Because the L1 penalty acts directly on coefficient magnitude, features on larger numeric scales can distort the regularization effect.

Another issue is regularizing the wrong parameters. In most lasso setups, the penalty belongs on the weights, not necessarily on the bias term.

Developers also expect exact zeros too early during training. Depending on optimizer choice, learning rate, and dataset size, coefficients may approach zero without landing there cleanly in a short run.

Finally, do not confuse lasso with generic sparse deep learning. This technique is naturally aligned with linear models; applying it blindly to deeper networks changes the optimization story considerably.

Summary

  • Lasso regression is linear regression with an L1 penalty on the weights.
  • In TensorFlow, the easiest implementation uses a dense layer with tf.keras.regularizers.L1(...).
  • 'GradientTape is useful when you want to write the lasso objective explicitly.'
  • Tune the L1 coefficient with validation data instead of guessing.
  • Standardize features so the regularization effect is meaningful across dimensions.

Course illustration
Course illustration

All Rights Reserved.