pandas
trend line
data visualization
Python
data analysis

Add trend line to pandas

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Pandas is excellent for loading and reshaping data, but it does not draw a trend line by itself. The usual approach is to keep the data in a DataFrame, compute a fitted model with NumPy or a statistics library, and plot both the raw points and the fitted values with Matplotlib.

Build A Simple Linear Trend Line

A linear trend line is a straight line of the form y = mx + b. For many quick analyses, NumPy gives you everything you need.

python
1import numpy as np
2import pandas as pd
3import matplotlib.pyplot as plt
4
5# Sample data
6df = pd.DataFrame(
7    {
8        "month": [1, 2, 3, 4, 5, 6],
9        "sales": [10, 13, 15, 18, 20, 24],
10    }
11)
12
13# Fit a first-degree polynomial, which is a straight line
14slope, intercept = np.polyfit(df["month"], df["sales"], 1)
15df["trend"] = slope * df["month"] + intercept
16
17ax = df.plot.scatter(x="month", y="sales", label="actual")
18df.plot(x="month", y="trend", ax=ax, color="crimson", label="trend")
19plt.show()

This approach is practical because the fitted values are written back into the DataFrame. Once the trend column exists, you can export it, compare it with future observations, or add more visual layers.

Why Pandas Alone Is Not Enough

Pandas has plotting helpers, but those helpers rely on Matplotlib under the hood. The data structure and the charting library play different roles:

  • pandas manages tabular data and labels
  • NumPy computes the line coefficients
  • Matplotlib renders the chart

That separation matters when debugging. If the line looks wrong, the problem is usually in the fitting step or in the input data type, not in DataFrame.plot itself.

Fit Higher-Order Curves

Sometimes the data bends instead of following a straight line. In that case, you can fit a polynomial curve and evaluate it on the original x values.

python
1import numpy as np
2import pandas as pd
3import matplotlib.pyplot as plt
4
5x = np.array([1, 2, 3, 4, 5, 6])
6y = np.array([2, 5, 9, 15, 24, 36])
7
8df = pd.DataFrame({"x": x, "y": y})
9coefficients = np.polyfit(df["x"], df["y"], 2)
10curve = np.poly1d(coefficients)
11df["trend"] = curve(df["x"])
12
13ax = df.plot.scatter(x="x", y="y", label="data")
14df.plot(x="x", y="trend", ax=ax, color="green", label="quadratic trend")
15plt.show()

A second-degree fit can model curvature, but higher degree does not automatically mean better. If you increase the degree too far, the line starts fitting noise instead of the actual signal.

Use Dates On The X Axis

A common pandas workflow uses timestamps instead of numeric positions. You still need numeric values for fitting, so convert the dates to an ordinal index or a simple sequence.

python
1import numpy as np
2import pandas as pd
3
4series = pd.DataFrame(
5    {
6        "date": pd.date_range("2025-01-01", periods=5, freq="D"),
7        "value": [100, 103, 101, 108, 110],
8    }
9)
10
11x = np.arange(len(series))
12slope, intercept = np.polyfit(x, series["value"], 1)
13series["trend"] = slope * x + intercept
14
15print(series)

Using np.arange is often easier than converting timestamps into large integer representations. The shape of the trend line depends on the relative ordering and spacing, which are preserved here.

When To Use Regression Libraries

For a visual aid, np.polyfit is usually enough. If you need confidence intervals, model diagnostics, or multiple predictors, switch to a proper regression library such as statsmodels or scikit-learn. The line on the chart may look similar, but the analysis behind it is stronger.

For example, scikit-learn can be useful when the same pipeline later feeds a model training workflow rather than a one-off plot.

Common Pitfalls

One common mistake is fitting on string data. If the x or y columns were read from CSV as text, polyfit may fail or silently coerce values in ways you did not expect. Check df.dtypes before fitting.

Another problem is plotting a trend line against unsorted data. If the x values are out of order, the line may zigzag visually even when the fit is correct. Sort by the horizontal axis before plotting.

The last frequent issue is overfitting. A line that perfectly touches every point can still be a poor trend model. Start with the simplest curve that answers the question you actually have.

Summary

  • Use pandas to hold the data, not to compute the trend line itself.
  • 'numpy.polyfit is the fastest way to add a linear or polynomial fit.'
  • Store the fitted values in a new DataFrame column so they can be plotted or exported.
  • Convert dates to a simple numeric sequence before fitting.
  • Use regression libraries when you need statistical detail beyond a visual trend.

Course illustration
Course illustration

All Rights Reserved.