Residual plot for residual vs predicted value in Python
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A residuals-versus-predicted plot is one of the fastest ways to check whether a regression model is behaving sensibly. The plot compares fitted values on the horizontal axis with residuals, which are actual values minus predicted values, on the vertical axis. A good fit usually produces residuals scattered randomly around zero rather than a visible pattern.
What Residuals Represent
For each observation:
- predicted value = what the model estimated
- residual = actual minus predicted
If the model is well specified, the residuals should not systematically depend on the fitted value.
That simple quantity is the foundation of the plot.
Create A Residual Plot With scikit-learn And Matplotlib
This is often enough for a first diagnostic check.
What A Good Plot Looks Like
A healthy residual plot usually shows:
- points centered around zero
- no strong curved pattern
- roughly similar vertical spread across the range of fitted values
That does not prove the model is perfect, but it suggests the linear fit is at least not obviously violating core assumptions.
What Problem Patterns Look Like
Several visual patterns are warnings:
- curve shape: the relationship may be nonlinear
- funnel shape: residual variance may change with prediction size
- clusters: missing groups or interactions may exist
- isolated large points: potential outliers or influential observations
If residuals trend upward or downward as predictions increase, the model is systematically missing structure.
Example Of A Nonlinear Misspecification
Because the true relationship is quadratic, a linear model leaves a visible curved pattern in the residuals.
Use The Plot As A Diagnostic, Not A Final Verdict
A residual plot is one diagnostic tool, not a proof of correctness. It works best alongside:
- residual histogram or Q-Q plot
- leverage or influence checks
- domain knowledge about feature construction
- train-test evaluation metrics
A random-looking residual plot is reassuring, but it does not guarantee the model is the right one for the business problem.
Statsmodels Also Makes This Easy
If you are already using Statsmodels for regression summaries, the residual calculation is equally straightforward.
You can then plot pred versus resid in the same way with Matplotlib.
Common Pitfalls
The most common mistake is plotting residuals against the wrong quantity, such as the original input feature when the diagnostic you want is residuals versus fitted values. Another is forgetting the horizontal zero line, which makes interpretation harder. Developers also sometimes overinterpret a small sample of noisy points and declare the model broken without checking whether the apparent pattern is stable. Finally, a residual plot can suggest a problem, but fixing the model still requires domain-informed feature or model changes.
Summary
- A residuals-versus-predicted plot checks whether regression errors behave randomly around zero.
- Residuals are computed as actual values minus predicted values.
- Curves, funnels, and clusters often indicate model misspecification or changing variance.
- Matplotlib plus scikit-learn or Statsmodels is enough to generate the plot.
- Use the plot as a diagnostic aid, not as the only measure of model quality.

