scikit-learn statsmodels - which R-squared is correct?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If scikit-learn and statsmodels give you different R^2 values for what looks like the same regression, the usual explanation is not that one library is wrong. The real issue is that the models are not actually identical, or the metric is being computed on different data with different assumptions. Once those assumptions line up, the numbers usually match.
What R^2 Measures
R^2 describes how much of the variation in the target is explained by the model relative to predicting the mean every time. In ordinary least squares with an intercept, the familiar formula is:
R^2 = 1 - SS_res / SS_tot
Two consequences matter here. First, the definition depends on the predictions you compare against the observed target. Second, the presence or absence of an intercept changes the baseline and can change the reported value significantly.
scikit-learn typically exposes R^2 through model.score(X, y) or sklearn.metrics.r2_score. statsmodels reports results.rsquared and also results.rsquared_adj for adjusted R^2. Those metrics answer related but not identical questions.
The Most Common Reason for a Mismatch
The biggest source of confusion is the intercept term.
- In
scikit-learn,LinearRegression()includes an intercept by default throughfit_intercept=True. - In
statsmodels,OLSdoes not add a constant automatically. You must add it yourself withsm.add_constant.
If you forget that step, you are fitting two different models. The resulting R^2 values can differ a lot, and both can still be mathematically correct for their respective model definitions.
Here is a minimal example that makes the two libraries agree:
When the design matrix is the same, the fitted predictions should match up to floating-point noise, and so should the ordinary R^2.
When Different Answers Are Still Valid
There are several cases where the libraries legitimately report different values:
Adjusted R^2 Versus Plain R^2
statsmodels gives you both. Adjusted R^2 penalizes extra predictors and is useful for model comparison in classical regression analysis. scikit-learn does not report adjusted R^2 on estimators by default.
If you compare sk_model.score(...) to sm_model.rsquared_adj, you are comparing different metrics.
Training Data Versus Holdout Data
statsmodels summaries usually describe the fit on the data used to train the model. In scikit-learn, people often call .score(X_test, y_test) on a test set. Test-set R^2 can be much lower, and that does not indicate a bug.
Weighted or Transformed Models
If one model uses weights, a transformed target, regularization, or a formula expansion with interaction terms, then the design matrices differ. The reported R^2 values are no longer expected to match.
A Debugging Checklist
When numbers disagree, compare the following in order:
- Are both models using the same input columns?
- Did you add a constant in
statsmodelswhen needed? - Are predictions being evaluated on the same rows?
- Are you comparing plain
R^2to plainR^2, not adjustedR^2? - Is one library fitting a regularized model while the other fits ordinary least squares?
You can also compare the predictions directly. If sk_pred and sm_pred differ, the models are not the same regardless of what the scores say.
Which One Should You Trust
Trust the metric that matches the model and dataset you actually care about. For machine-learning workflows, scikit-learn is often the better fit because it integrates naturally with preprocessing, cross-validation, and test-set evaluation. For statistical interpretation, statsmodels is often preferable because it provides coefficient tables, standard errors, and diagnostics.
The right question is usually not "which library is correct?" but "am I comparing the same regression under the same scoring setup?"
Common Pitfalls
The classic pitfall is forgetting sm.add_constant(X) in statsmodels. That silently changes the model and is the fastest path to mismatched R^2 values.
Another pitfall is comparing results.rsquared_adj from statsmodels against model.score(...) from scikit-learn. Adjusted and unadjusted R^2 are different by design.
It is also common to compare a training score from one library with a test score from the other. Always check which dataset each value came from.
Finally, formula APIs can create extra columns automatically, such as dummy variables or interactions. If one model is built from a formula and the other from a raw matrix, verify the expanded design matrix before concluding that the metrics disagree.
Summary
- Different
R^2values usually mean the two libraries are fitting or scoring different models. - '
statsmodels.OLSneeds an explicit constant if you want an intercept.' - Plain
R^2and adjustedR^2are not interchangeable. - Compare predictions on the same rows to confirm whether the fits are actually equivalent.
- Use
scikit-learnfor pipeline-oriented evaluation andstatsmodelsfor regression diagnostics and inference.

