linear regression
gradient descent
scikit-learn
machine learning
python

Linear Regression and Gradient Descent in Scikit learn?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In scikit-learn, linear regression can mean two different training styles: a direct least-squares solution with LinearRegression, or iterative optimization with SGDRegressor. Both produce linear models, but they differ in scaling behavior, tuning requirements, and when they make sense operationally.

Closed-Form Linear Regression

LinearRegression solves ordinary least squares directly. It is a strong default when the dataset fits in memory and you want a one-shot fit with minimal tuning.

python
1import numpy as np
2from sklearn.linear_model import LinearRegression
3from sklearn.metrics import r2_score
4from sklearn.model_selection import train_test_split
5
6rng = np.random.default_rng(42)
7X = rng.normal(size=(2000, 5))
8y = X @ np.array([1.5, -2.0, 0.0, 0.8, 3.0]) + rng.normal(0, 0.5, 2000)
9
10X_train, X_test, y_train, y_test = train_test_split(
11    X, y, test_size=0.2, random_state=42
12)
13
14model = LinearRegression()
15model.fit(X_train, y_train)
16
17pred = model.predict(X_test)
18print("R2:", r2_score(y_test, pred))
19print("Coefficients:", model.coef_)

This approach is simple, stable, and often the right answer for moderate tabular problems.

Gradient Descent With SGDRegressor

If you want gradient-based training in scikit-learn, the standard estimator is SGDRegressor. It updates the model iteratively and is better suited to large datasets or incremental learning.

python
1from sklearn.linear_model import SGDRegressor
2from sklearn.pipeline import make_pipeline
3from sklearn.preprocessing import StandardScaler
4
5sgd_model = make_pipeline(
6    StandardScaler(),
7    SGDRegressor(
8        loss="squared_error",
9        alpha=0.0001,
10        max_iter=3000,
11        tol=1e-4,
12        random_state=42,
13    ),
14)
15
16sgd_model.fit(X_train, y_train)
17pred_sgd = sgd_model.predict(X_test)
18print("R2 SGD:", r2_score(y_test, pred_sgd))

This is still a linear model, but the fitting process is iterative rather than direct.

Why Feature Scaling Matters for Gradient Descent

Gradient descent is sensitive to feature scale. If one feature ranges from 0 to 1 and another ranges from 0 to 100000, updates become harder to tune and convergence becomes less predictable.

That is why StandardScaler is in the pipeline above. Without scaling, SGDRegressor often converges slowly, underfits, or behaves erratically.

LinearRegression does not have the same optimization sensitivity because it is not walking the loss surface with step updates.

When to Use Each Estimator

Use LinearRegression when:

  • the dataset is moderate in size
  • a direct least-squares fit is affordable
  • you want low tuning overhead

Use SGDRegressor when:

  • the dataset is very large
  • you want incremental training with partial_fit
  • you need an online-learning style workflow

Many beginners hear "gradient descent" and assume it is always more advanced. In scikit-learn, it is often more operationally demanding, not automatically better.

Evaluate Models Fairly

When comparing these estimators, use the same train-test split and more than one metric.

python
1from sklearn.metrics import mean_absolute_error, mean_squared_error
2
3mae = mean_absolute_error(y_test, pred_sgd)
4rmse = mean_squared_error(y_test, pred_sgd, squared=False)
5
6print("MAE:", mae)
7print("RMSE:", rmse)

Two models can have similar R2 values while behaving differently in the tails or showing different error distributions.

Incremental Learning Is the Real Strength of SGD

One reason to choose gradient descent in scikit-learn is incremental updates. SGDRegressor can train on batches over time, which is something LinearRegression does not do in the same way.

python
1online_model = make_pipeline(
2    StandardScaler(with_mean=False),
3    SGDRegressor(random_state=42),
4)
5
6online_model.fit(X_train[:500], y_train[:500])
7online_model.named_steps["sgdregressor"].partial_fit(X_train[500:1000], y_train[500:1000])

That kind of workflow matters in streaming or memory-constrained environments.

Common Pitfalls

The most common mistake is using SGDRegressor without feature scaling. That usually leads to poor or unstable results.

Another issue is comparing LinearRegression and SGDRegressor on different splits or with inconsistent preprocessing. That makes the comparison meaningless.

Developers also sometimes expect default SGD settings to behave well on every dataset. Learning rate, regularization, and iteration count matter.

Finally, do not choose iterative optimization just because it sounds more "machine learning." For many ordinary regression tasks, LinearRegression is the simpler and better tool.

Summary

  • 'LinearRegression fits a direct least-squares linear model with minimal tuning.'
  • 'SGDRegressor trains a linear model with gradient-based updates.'
  • Feature scaling is essential for gradient descent methods.
  • Use the same preprocessing and evaluation split when comparing estimators.
  • Pick gradient descent when you need scale or incremental updates, not by default.

Course illustration
Course illustration

All Rights Reserved.