Python
numpy
data manipulation
array reshaping
machine learning

Reshape your data either using array.reshape-1, 1 if your data has a single feature or array.reshape1, -1 if it contains a single sample

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If you work with NumPy, pandas, or scikit-learn, you will eventually hit an error that says your array has the wrong shape. The most common fix is either reshape(-1, 1) or reshape(1, -1), but those two calls mean very different things and are easy to swap by accident.

Why Shape Matters in Machine Learning

Most machine learning APIs expect input data in two dimensions: rows represent samples, and columns represent features. In other words, the standard layout is (n_samples, n_features).

A plain NumPy array like this:

python
1import numpy as np
2
3x = np.array([10, 20, 30, 40])
4print(x.shape)

produces:

python
(4,)

That is a one-dimensional array. A model usually cannot tell whether those four numbers mean:

  • four samples with one feature each
  • one sample with four features

Reshaping removes that ambiguity.

When to Use reshape(-1, 1)

Use reshape(-1, 1) when you have a single feature and many samples. It turns a flat array into a column.

python
1import numpy as np
2
3x = np.array([10, 20, 30, 40])
4X = x.reshape(-1, 1)
5
6print(X)
7print(X.shape)

Output:

python
1[[10]
2 [20]
3 [30]
4 [40]]
5(4, 1)

This shape means:

  • '4 samples'
  • '1 feature'

The -1 tells NumPy to infer the correct size automatically. Since there are four elements total and one column was requested, NumPy creates four rows.

This is the form you usually want before fitting a simple scikit-learn model:

python
1import numpy as np
2from sklearn.linear_model import LinearRegression
3
4x = np.array([1, 2, 3, 4, 5])
5y = np.array([3, 5, 7, 9, 11])
6
7X = x.reshape(-1, 1)
8
9model = LinearRegression()
10model.fit(X, y)
11
12prediction = model.predict(np.array([6]).reshape(-1, 1))
13print(prediction)

Here, each row is one training example.

When to Use reshape(1, -1)

Use reshape(1, -1) when you have a single sample that contains multiple features. It turns a flat array into a row.

python
1import numpy as np
2
3sample = np.array([170, 65, 29])
4row = sample.reshape(1, -1)
5
6print(row)
7print(row.shape)

Output:

python
[[170  65  29]]
(1, 3)

This shape means:

  • '1 sample'
  • '3 features'

That is the right layout when a trained model expects several input features for one observation.

For example:

python
1import numpy as np
2from sklearn.linear_model import LogisticRegression
3
4X = np.array([
5    [160, 55, 25],
6    [175, 72, 31],
7    [180, 80, 40],
8    [155, 50, 22]
9])
10y = np.array([0, 1, 1, 0])
11
12model = LogisticRegression()
13model.fit(X, y)
14
15new_person = np.array([172, 70, 30]).reshape(1, -1)
16print(model.predict(new_person))

The model receives one row containing all feature values for that single prediction.

A Practical Way to Decide

Ask yourself one question: what does each number represent?

If each number is a separate observation of the same measurement, use reshape(-1, 1).

Example:

  • temperatures recorded each day
  • house prices collected over time
  • one sensor reading per sample

If the numbers belong to one object and describe different properties of it, use reshape(1, -1).

Example:

  • one customer with age, income, and score
  • one image represented by flattened pixel values
  • one record with several input fields

You can confirm your result with .shape:

python
1import numpy as np
2
3values = np.array([1, 2, 3, 4])
4
5print(values.reshape(-1, 1).shape)  # four rows, one column
6print(values.reshape(1, -1).shape)  # one row, four columns

Working with pandas Data

If your source data comes from pandas, you often extract one column and then reshape it before modeling:

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "hours": [1, 2, 3, 4],
5    "score": [52, 60, 68, 75]
6})
7
8X = df["hours"].to_numpy().reshape(-1, 1)
9y = df["score"].to_numpy()
10
11print(X.shape)
12print(y.shape)

This is a common pattern because a single pandas column becomes a one-dimensional array after conversion.

Common Pitfalls

Using the wrong orientation is the main mistake. If you call reshape(1, -1) when your data contains many samples, the model will interpret your entire dataset as one sample with many features.

Another issue is assuming -1 is magic. It only means “infer this dimension from the total number of elements.” The other dimension still has to be valid, or NumPy will raise a reshape error.

It is also common to pass a scalar or a Python list directly into predict. If the model expects two dimensions, wrap the values in a NumPy array and reshape explicitly so the intended sample-feature layout is obvious.

Finally, do not skip checking .shape. Many debugging sessions become much shorter if you print shapes before fit and before predict.

Summary

  • Machine learning inputs are usually shaped as (n_samples, n_features).
  • Use reshape(-1, 1) for many samples with one feature each.
  • Use reshape(1, -1) for one sample with many features.
  • '-1 tells NumPy to infer the missing dimension automatically.'
  • Printing .shape is the fastest way to confirm that your data layout matches what the model expects.

Course illustration
Course illustration

All Rights Reserved.