pandas
numpy
dataframe
conversion
python programming

Convert Pandas dataframe to NumPy array

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Converting a Pandas DataFrame to a NumPy array is common when you need fast numerical operations or need to pass data into a library that expects raw arrays. The key detail is that the conversion keeps the cell values but drops DataFrame labels such as column names and the index.

The Modern Way: to_numpy()

The standard method is to_numpy().

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "x": [1, 2, 3],
5    "y": [4, 5, 6],
6})
7
8arr = df.to_numpy()
9
10print(arr)
11print(type(arr))

Output:

text
1[[1 4]
2 [2 5]
3 [3 6]]
4<class 'numpy.ndarray'>

This returns only the underlying values, not the column labels.

Choosing a Data Type

You can control the dtype during conversion:

python
arr = df.to_numpy(dtype="float32")
print(arr.dtype)

This is useful when downstream code expects a specific numeric type, such as machine learning code that prefers float32.

If the DataFrame contains columns with mixed types, NumPy may choose a broader dtype such as object. That is not wrong, but it changes how vectorized operations behave and can reduce performance.

Mixed-Type Example

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "name": ["Ada", "Linus"],
5    "age": [36, 55],
6})
7
8arr = df.to_numpy()
9print(arr)
10print(arr.dtype)

Because one column is text and the other is numeric, the resulting array uses a common type that can represent both values.

If you need a purely numeric array, select only numeric columns first:

python
numeric = df.select_dtypes(include="number").to_numpy()
print(numeric)

values Versus to_numpy()

Older code often uses:

python
arr = df.values

That still works in many cases, but to_numpy() is the clearer and more modern API because it exposes options such as dtype and copy.

For new code, prefer:

python
arr = df.to_numpy()

Copy Behavior

If you need to ensure the array is an independent copy, ask for it explicitly:

python
arr = df.to_numpy(copy=True)

That matters when you do not want later DataFrame operations or array mutations to share memory unexpectedly. In performance-sensitive code, avoiding an unnecessary copy can also matter, so the choice depends on your use case.

Labels Are Lost in the Conversion

After conversion, the array no longer knows anything about:

  • column names
  • index labels
  • column dtypes as a DataFrame concept

If you still need the labels, keep them separately:

python
1columns = df.columns.to_list()
2arr = df.to_numpy()
3
4print(columns)
5print(arr)

This is a common pattern when you want NumPy speed but still need to interpret the array later.

The same idea applies if downstream code needs feature names for model inspection or debugging: keep the labels beside the array rather than expecting NumPy to preserve them.

Common Pitfalls

The most common mistake is forgetting that mixed-type DataFrames can produce an object array. That surprises people expecting a numeric matrix suitable for fast linear algebra.

Another issue is assuming the index or column names are preserved. A NumPy array contains values only.

A third pitfall is converting too early. If you still need label-aware filtering, joining, or group operations, keep the data as a DataFrame until the last practical moment.

Finally, to_numpy(copy=False) is not a guarantee that no copy happened. It is a request that allows reuse when possible, not a promise in every case.

Summary

  • Use DataFrame.to_numpy() to convert a Pandas DataFrame into a NumPy array.
  • The conversion keeps values but drops labels such as index and column names.
  • Mixed data types may produce an object array.
  • Select numeric columns first if you need a numeric matrix.
  • Use dtype and copy options when downstream code needs tighter control.

Course illustration
Course illustration

All Rights Reserved.