SVM
scikit-learn
machine learning
error handling
data preprocessing

Bad input shape error on SVM training using scikit

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Most "bad input shape" errors in scikit-learn SVM code come from one rule: X must be a two-dimensional array of shape (n_samples, n_features), while y must usually be a one-dimensional target vector. If either side has the wrong shape, SVC.fit raises a ValueError.

The confusing part is that NumPy, pandas, and slicing operations can quietly change dimensionality. A column can turn into a one-dimensional series, or a target can become a two-dimensional frame, and the SVM then rejects it.

What Shape the SVM Expects

For a normal classification task:

  • 'X is a matrix'
  • each row is one sample
  • each column is one feature
  • 'y is a vector with one label per sample'

This is correct:

python
1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([
5    [1.0, 2.0],
6    [1.5, 1.8],
7    [5.0, 8.0],
8    [6.0, 9.0],
9])
10y = np.array([0, 0, 1, 1])
11
12model = SVC(kernel="linear")
13model.fit(X, y)

Here X.shape is (4, 2) and y.shape is (4,), which is exactly what the estimator wants.

The Most Common Shape Mistake

The most frequent bug appears when you have a single feature and pass a flat array instead of a two-dimensional matrix.

This fails:

python
1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([1.0, 2.0, 3.0, 4.0])
5y = np.array([0, 0, 1, 1])
6
7model = SVC()
8model.fit(X, y)

Typical error text looks like this:

text
Expected 2D array, got 1D array instead

The fix is to reshape the input so each value becomes a row with one feature:

python
1X = X.reshape(-1, 1)
2
3model = SVC()
4model.fit(X, y)

The -1 tells NumPy to infer the number of rows, and 1 means one feature column.

pandas-Specific Issues

With pandas, the difference between df["feature"] and df[["feature"]] matters.

python
1import pandas as pd
2from sklearn.svm import SVC
3
4df = pd.DataFrame({
5    "height": [150, 160, 170, 180],
6    "weight": [50, 60, 70, 85],
7    "label": [0, 0, 1, 1],
8})
9
10X_bad = df["height"]
11y = df["label"]
12
13print(X_bad.shape)  # (4,)

df["height"] is a series, so it is one-dimensional. For an SVM feature matrix, use:

python
1X = df[["height"]]
2y = df["label"]
3
4model = SVC()
5model.fit(X, y)

df[["height"]] keeps the two-dimensional table shape.

The same idea applies when selecting multiple columns. Make sure your column list stays inside double brackets so you still get a frame instead of a series.

Mismatched Training and Prediction Shapes

You can also get shape errors during prediction. If the model was trained on two features, a future call to predict must also provide two features per sample.

python
1import numpy as np
2from sklearn.svm import SVC
3
4X = np.array([
5    [1.0, 2.0],
6    [2.0, 3.0],
7    [8.0, 8.0],
8    [9.0, 10.0],
9])
10y = np.array([0, 0, 1, 1])
11
12model = SVC()
13model.fit(X, y)
14
15sample = np.array([[4.0, 5.0]])
16print(model.predict(sample))

This is correct because sample.shape is (1, 2): one sample, two features.

This is wrong:

python
sample = np.array([4.0, 5.0])
model.predict(sample)

That is a one-dimensional array. Wrap it in another list or reshape it:

python
sample = sample.reshape(1, -1)

Practical Debugging Checklist

Before calling fit or predict, print shapes directly:

python
print("X:", X.shape)
print("y:", y.shape)

For classification with scikit-learn SVMs, a good default checklist is:

  • 'X.ndim == 2'
  • 'y.ndim == 1'
  • 'len(X) == len(y)'
  • training and prediction inputs use the same number of feature columns

If you use preprocessing pipelines, check the output of the transformer too. A custom preprocessing step can flatten arrays accidentally.

Common Pitfalls

The biggest pitfall is assuming a single feature can be passed as a simple list. In scikit-learn, one feature still needs a two-dimensional matrix with one column.

Another common mistake is passing y as a data frame with shape (n, 1) instead of a series or flat array. Some estimators handle that quietly, but many workflows behave better when the target is explicitly one-dimensional.

Slicing is another source of bugs. In pandas, df["col"] and df[["col"]] are not interchangeable. One gives a series, the other gives a frame.

Finally, do not guess. Print X.shape and y.shape before training. Most shape bugs become obvious immediately once you inspect the arrays.

Summary

  • 'X for an SVM must usually have shape (n_samples, n_features).'
  • 'y should usually be a one-dimensional label array.'
  • A single feature still needs a two-dimensional input such as reshape(-1, 1).
  • In pandas, use df[["col"]] for feature matrices and df["label"] for targets.
  • Training and prediction must use the same number of feature columns.
  • Printing shapes is the fastest way to diagnose these errors.

Course illustration
Course illustration

All Rights Reserved.