Python
ValueError
Debugging
Python Errors
Exception Handling

ValueError x and y must be the same size

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The error ValueError: x and y must be the same size usually appears when plotting data in Matplotlib, especially with plot or scatter. It means the library expected one y value for every x value, but the two sequences have different lengths or incompatible shapes.

What the Error Actually Means

In a basic plot, points are paired by position:

  • 'x[0] goes with y[0]'
  • 'x[1] goes with y[1]'
  • and so on

If one sequence is longer, Matplotlib has no unambiguous way to match points.

This code fails:

python
1import matplotlib.pyplot as plt
2
3x = [1, 2, 3, 4]
4y = [10, 20, 30]
5
6plt.scatter(x, y)
7plt.show()

There are four x values and only three y values, so Matplotlib raises the size error before drawing anything.

Diagnose the Problem First

Before changing the data, inspect both length and shape.

python
1import numpy as np
2
3x = np.array([1, 2, 3, 4])
4y = np.array([10, 20, 30])
5
6print(len(x), len(y))
7print(x.shape, y.shape)

This quick check tells you whether the mismatch is obvious. In real projects, the arrays often look similar enough that the bug hides inside filtering, missing values, or a preprocessing step that touched only one side.

If you build arrays through pandas, inspect them right before plotting:

python
print(df["time"].shape)
print(df["value"].shape)

That matters because earlier transformations may have changed the row counts.

Fix the Data at the Source

The best fix is to generate x and y from the same filtered dataset instead of trimming one side blindly.

For example, this pattern is correct:

python
1import pandas as pd
2import matplotlib.pyplot as plt
3
4df = pd.DataFrame({
5    "time": [1, 2, 3, 4],
6    "value": [10, 20, 30, 40]
7})
8
9x = df["time"]
10y = df["value"]
11
12plt.plot(x, y)
13plt.show()

Now both sequences come from the same rows, so the lengths match naturally.

A Common Real Bug: Filtering Only One Array

Many mismatches happen when you drop missing values or apply a condition to only one column.

Broken version:

python
1import pandas as pd
2import matplotlib.pyplot as plt
3
4df = pd.DataFrame({
5    "x": [1, 2, 3, 4],
6    "y": [10, None, 30, 40]
7})
8
9x = df["x"]
10y = df["y"].dropna()
11
12plt.scatter(x, y)
13plt.show()

Here x still has four rows, but y has only three after dropna().

Correct version:

python
1import pandas as pd
2import matplotlib.pyplot as plt
3
4df = pd.DataFrame({
5    "x": [1, 2, 3, 4],
6    "y": [10, None, 30, 40]
7})
8
9clean = df.dropna(subset=["x", "y"])
10
11plt.scatter(clean["x"], clean["y"])
12plt.show()

The filtering is applied to the dataset as a whole, so paired rows stay paired.

When You Only Have y

Sometimes you do not have an explicit x array and only want to plot values against their index. In that case, create an x array with the same length as y.

python
1import numpy as np
2import matplotlib.pyplot as plt
3
4y = np.array([5, 8, 13, 21])
5x = np.arange(len(y))
6
7plt.plot(x, y, marker="o")
8plt.show()

This works because np.arange(len(y)) creates exactly one x position per y value.

Shape Problems with NumPy Arrays

Length mismatches are the usual issue, but shape mismatches can also confuse debugging. For example, a column vector and a flat array may both contain the same count of numbers while still behaving differently in downstream code.

python
1import numpy as np
2
3x = np.array([[1], [2], [3], [4]])
4y = np.array([10, 20, 30, 40])
5
6print(x.shape)
7print(y.shape)

Matplotlib often handles these cases, but if behavior looks odd, flatten the inputs deliberately:

python
x = x.ravel()
y = y.ravel()

That gives you a predictable one-dimensional representation.

Common Pitfalls

The most common mistake is fixing the error by slicing one array to match the other without asking why the mismatch happened. That can hide a data-quality bug rather than solve it.

Another problem is applying filtering, sorting, or grouping to only one variable. If x and y are supposed to describe the same observations, every row-level transformation must be applied consistently.

People also forget to inspect shapes after converting from pandas to NumPy. A Series and a DataFrame column selection can produce slightly different shapes, which matters if earlier code expected a flat vector.

Finally, if you are plotting in a function, print or assert lengths before calling Matplotlib. A short validation step is much cheaper than tracing a plotting error later.

Summary

  • The error means Matplotlib cannot pair each x value with a corresponding y value.
  • Check len() and .shape immediately before plotting.
  • Prefer fixing the mismatch at the data-preparation step, not by trimming arrays blindly.
  • Filter or clean the whole dataset so paired rows stay aligned.
  • If you only have y, create x with np.arange(len(y)).

Course illustration
Course illustration

All Rights Reserved.