Implementation of Linear Regression Closed Form Solution
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Linear regression is one of the most fundamental algorithms in machine learning and statistics, used to model the relationship between a dependent variable and one or more independent variables. Unlike iterative methods such as gradient descent, the closed-form solution computes the optimal coefficients directly in a single step using matrix algebra. This article explains the derivation, implementation, and practical considerations of the closed-form solution.
Linear Regression Basics
Linear regression fits a line (or hyperplane in higher dimensions) to data by minimizing the sum of squared errors between observed and predicted values.
For univariate regression, the model is:
where is the dependent variable, is the independent variable, is the intercept, is the slope, and is the error term.
For multivariate regression with features, the model generalizes to:
where:
- is the vector of observed outputs
- is the design matrix (with a column of ones for the intercept)
- is the vector of coefficients
The Normal Equation
The closed-form solution minimizes the sum of squared residuals . Taking the gradient with respect to and setting it to zero yields the normal equation:
Derivation
Start with the cost function:
Expanding:
Taking the gradient and setting it to zero:
Solving for :
Step-by-Step Computation
Given data points with features:
- Construct the design matrix by prepending a column of ones:
- Compute , a square matrix
- Compute
- Compute , a vector
- Multiply:
Implementation
Python with NumPy
More Numerically Stable Version
Using np.linalg.lstsq avoids explicitly computing the matrix inverse, which can be numerically unstable:
This uses the SVD (Singular Value Decomposition) internally, which is more robust when is nearly singular.
Advantages and Disadvantages
| Aspect | Closed-Form Solution | Gradient Descent |
| Computation | Single matrix operation | Iterative, many steps |
| Hyperparameters | None | Learning rate, iterations |
| Complexity | per epoch | |
| Large | Impractical (matrix inverse) | Scales well |
| Large | Scales well | Each epoch is |
| Numerical stability | Can be poor | Generally stable |
The closed-form solution is ideal when the number of features is small (up to a few thousand). For high-dimensional problems, iterative methods like gradient descent or stochastic gradient descent are preferred because the cost of matrix inversion becomes prohibitive.
Regularized Variant: Ridge Regression
When is singular or nearly singular (multicollinearity), the inverse does not exist or is numerically unstable. Ridge regression adds a regularization term:
The term ensures the matrix is invertible and shrinks the coefficients toward zero, reducing overfitting. Here is the regularization strength.
Summary
The closed-form solution for linear regression provides exact coefficients in one computation step via the normal equation . It requires no hyperparameter tuning and is the preferred method when the feature count is manageable. For numerical stability, use SVD-based solvers rather than explicit matrix inversion. When features are highly correlated or numerous, ridge regression or iterative methods are better alternatives.

