Bad input shape error on SVM training using scikit

SVM

scikit-learn

machine learning

error handling

data preprocessing

Bad input shape error on SVM training using scikit

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Most "bad input shape" errors in scikit-learn SVM code come from one rule: X must be a two-dimensional array of shape (n_samples, n_features), while y must usually be a one-dimensional target vector. If either side has the wrong shape, SVC.fit raises a ValueError.

The confusing part is that NumPy, pandas, and slicing operations can quietly change dimensionality. A column can turn into a one-dimensional series, or a target can become a two-dimensional frame, and the SVM then rejects it.

What Shape the SVM Expects

For a normal classification task:

'X is a matrix'
each row is one sample
each column is one feature
'y is a vector with one label per sample'

This is correct:

python

1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([
5    [1.0, 2.0],
6    [1.5, 1.8],
7    [5.0, 8.0],
8    [6.0, 9.0],
9])
10y = np.array([0, 0, 1, 1])
11
12model = SVC(kernel="linear")
13model.fit(X, y)

Here X.shape is (4, 2) and y.shape is (4,), which is exactly what the estimator wants.

The Most Common Shape Mistake

The most frequent bug appears when you have a single feature and pass a flat array instead of a two-dimensional matrix.

This fails:

python

1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([1.0, 2.0, 3.0, 4.0])
5y = np.array([0, 0, 1, 1])
6
7model = SVC()
8model.fit(X, y)

Typical error text looks like this:

text

Expected 2D array, got 1D array instead

The fix is to reshape the input so each value becomes a row with one feature:

python

1X = X.reshape(-1, 1)
2
3model = SVC()
4model.fit(X, y)

The -1 tells NumPy to infer the number of rows, and 1 means one feature column.

pandas-Specific Issues

With pandas, the difference between df["feature"] and df[["feature"]] matters.

python

1import pandas as pd
2from sklearn.svm import SVC
3
4df = pd.DataFrame({
5    "height": [150, 160, 170, 180],
6    "weight": [50, 60, 70, 85],
7    "label": [0, 0, 1, 1],
8})
9
10X_bad = df["height"]
11y = df["label"]
12
13print(X_bad.shape)  # (4,)

df["height"] is a series, so it is one-dimensional. For an SVM feature matrix, use:

python

1X = df[["height"]]
2y = df["label"]
3
4model = SVC()
5model.fit(X, y)

df[["height"]] keeps the two-dimensional table shape.

The same idea applies when selecting multiple columns. Make sure your column list stays inside double brackets so you still get a frame instead of a series.

Mismatched Training and Prediction Shapes

You can also get shape errors during prediction. If the model was trained on two features, a future call to predict must also provide two features per sample.

python

1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([
5    [1.0, 2.0],
6    [2.0, 3.0],
7    [8.0, 8.0],
8    [9.0, 10.0],
9])
10y = np.array([0, 0, 1, 1])
11
12model = SVC()
13model.fit(X, y)
14
15sample = np.array([[4.0, 5.0]])
16print(model.predict(sample))

This is correct because sample.shape is (1, 2): one sample, two features.

This is wrong:

python

sample = np.array([4.0, 5.0])
model.predict(sample)

That is a one-dimensional array. Wrap it in another list or reshape it:

python

sample = sample.reshape(1, -1)

Practical Debugging Checklist

Before calling fit or predict, print shapes directly:

python

print("X:", X.shape)
print("y:", y.shape)

For classification with scikit-learn SVMs, a good default checklist is:

'X.ndim == 2'
'y.ndim == 1'
'len(X) == len(y)'
training and prediction inputs use the same number of feature columns

If you use preprocessing pipelines, check the output of the transformer too. A custom preprocessing step can flatten arrays accidentally.

Common Pitfalls

The biggest pitfall is assuming a single feature can be passed as a simple list. In scikit-learn, one feature still needs a two-dimensional matrix with one column.

Another common mistake is passing y as a data frame with shape (n, 1) instead of a series or flat array. Some estimators handle that quietly, but many workflows behave better when the target is explicitly one-dimensional.

Slicing is another source of bugs. In pandas, df["col"] and df[["col"]] are not interchangeable. One gives a series, the other gives a frame.

Finally, do not guess. Print X.shape and y.shape before training. Most shape bugs become obvious immediately once you inspect the arrays.

Summary

'X for an SVM must usually have shape (n_samples, n_features).'
'y should usually be a one-dimensional label array.'
A single feature still needs a two-dimensional input such as reshape(-1, 1).
In pandas, use df[["col"]] for feature matrices and df["label"] for targets.
Training and prediction must use the same number of feature columns.
Printing shapes is the fastest way to diagnose these errors.