scikit-learn statsmodels - which R-squared is correct?

scikit-learn

statsmodels

R-squared

machine learning

regression analysis

scikit-learn statsmodels - which R-squared is correct?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

If scikit-learn and statsmodels give you different R^2 values for what looks like the same regression, the usual explanation is not that one library is wrong. The real issue is that the models are not actually identical, or the metric is being computed on different data with different assumptions. Once those assumptions line up, the numbers usually match.

What `R^2` Measures

R^2 describes how much of the variation in the target is explained by the model relative to predicting the mean every time. In ordinary least squares with an intercept, the familiar formula is:

R^2 = 1 - SS_res / SS_tot

Two consequences matter here. First, the definition depends on the predictions you compare against the observed target. Second, the presence or absence of an intercept changes the baseline and can change the reported value significantly.

scikit-learn typically exposes R^2 through model.score(X, y) or sklearn.metrics.r2_score. statsmodels reports results.rsquared and also results.rsquared_adj for adjusted R^2. Those metrics answer related but not identical questions.

The Most Common Reason for a Mismatch

The biggest source of confusion is the intercept term.

In scikit-learn, LinearRegression() includes an intercept by default through fit_intercept=True.
In statsmodels, OLS does not add a constant automatically. You must add it yourself with sm.add_constant.

If you forget that step, you are fitting two different models. The resulting R^2 values can differ a lot, and both can still be mathematically correct for their respective model definitions.

Here is a minimal example that makes the two libraries agree:

python

1import numpy as np
2import statsmodels.api as sm
3from sklearn.linear_model import LinearRegression
4from sklearn.metrics import r2_score
5
6X = np.array([[1.0], [2.0], [3.0], [4.0], [5.0]])
7y = np.array([3.2, 5.1, 7.2, 8.9, 11.1])
8
9sk_model = LinearRegression()
10sk_model.fit(X, y)
11sk_pred = sk_model.predict(X)
12
13X_sm = sm.add_constant(X)
14sm_model = sm.OLS(y, X_sm).fit()
15sm_pred = sm_model.predict(X_sm)
16
17print("scikit-learn score:", sk_model.score(X, y))
18print("scikit-learn r2_score:", r2_score(y, sk_pred))
19print("statsmodels rsquared:", sm_model.rsquared)
20print("Predictions equal:", np.allclose(sk_pred, sm_pred))

When the design matrix is the same, the fitted predictions should match up to floating-point noise, and so should the ordinary R^2.

When Different Answers Are Still Valid

There are several cases where the libraries legitimately report different values:

Adjusted `R^2` Versus Plain `R^2`

statsmodels gives you both. Adjusted R^2 penalizes extra predictors and is useful for model comparison in classical regression analysis. scikit-learn does not report adjusted R^2 on estimators by default.

python

1n = len(y)
2p = X.shape[1]
3r2 = sk_model.score(X, y)
4adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
5
6print("plain R^2:", r2)
7print("adjusted R^2:", adjusted_r2)
8print("statsmodels adjusted R^2:", sm_model.rsquared_adj)

If you compare sk_model.score(...) to sm_model.rsquared_adj, you are comparing different metrics.

Training Data Versus Holdout Data

statsmodels summaries usually describe the fit on the data used to train the model. In scikit-learn, people often call .score(X_test, y_test) on a test set. Test-set R^2 can be much lower, and that does not indicate a bug.

Weighted or Transformed Models

If one model uses weights, a transformed target, regularization, or a formula expansion with interaction terms, then the design matrices differ. The reported R^2 values are no longer expected to match.

A Debugging Checklist

When numbers disagree, compare the following in order:

Are both models using the same input columns?
Did you add a constant in statsmodels when needed?
Are predictions being evaluated on the same rows?
Are you comparing plain R^2 to plain R^2, not adjusted R^2?
Is one library fitting a regularized model while the other fits ordinary least squares?

You can also compare the predictions directly. If sk_pred and sm_pred differ, the models are not the same regardless of what the scores say.

Which One Should You Trust

Trust the metric that matches the model and dataset you actually care about. For machine-learning workflows, scikit-learn is often the better fit because it integrates naturally with preprocessing, cross-validation, and test-set evaluation. For statistical interpretation, statsmodels is often preferable because it provides coefficient tables, standard errors, and diagnostics.

The right question is usually not "which library is correct?" but "am I comparing the same regression under the same scoring setup?"

Common Pitfalls

The classic pitfall is forgetting sm.add_constant(X) in statsmodels. That silently changes the model and is the fastest path to mismatched R^2 values.

Another pitfall is comparing results.rsquared_adj from statsmodels against model.score(...) from scikit-learn. Adjusted and unadjusted R^2 are different by design.

It is also common to compare a training score from one library with a test score from the other. Always check which dataset each value came from.

Finally, formula APIs can create extra columns automatically, such as dummy variables or interactions. If one model is built from a formula and the other from a raw matrix, verify the expanded design matrix before concluding that the metrics disagree.

Summary

Different R^2 values usually mean the two libraries are fitting or scoring different models.
'statsmodels.OLS needs an explicit constant if you want an intercept.'
Plain R^2 and adjusted R^2 are not interchangeable.
Compare predictions on the same rows to confirm whether the fits are actually equivalent.
Use scikit-learn for pipeline-oriented evaluation and statsmodels for regression diagnostics and inference.

scikit-learn statsmodels - which R-squared is correct?

Master System Design with Codemia

Introduction

What R^2 Measures

The Most Common Reason for a Mismatch

When Different Answers Are Still Valid

Adjusted R^2 Versus Plain R^2

Training Data Versus Holdout Data

Weighted or Transformed Models

A Debugging Checklist

Which One Should You Trust

Common Pitfalls

Summary

What `R^2` Measures

Adjusted `R^2` Versus Plain `R^2`