NumPy
Python
Data Cleaning
NaN
Data Manipulation

How do I remove NaN values from a NumPy array?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

When working with real-world datasets in Python, you will inevitably encounter missing or undefined values represented as NaN (Not a Number). NumPy, the foundational library for numerical computing, uses NaN as a floating-point sentinel value defined by the IEEE 754 standard. Knowing how to detect, remove, and work around NaN values is essential for producing correct analytical results.

Detecting NaN with np.isnan

Before removing NaN values you need to find them. The function np.isnan() returns a boolean array of the same shape as the input, with True wherever a NaN appears.

python
1import numpy as np
2
3arr = np.array([1.0, 2.0, np.nan, 4.0, np.nan, 6.0])
4mask = np.isnan(arr)
5print(mask)
6# [False False  True False  True False]

This boolean mask is the building block for every removal technique that follows.

Boolean Indexing to Remove NaN

The most direct way to strip NaN values from a 1-D array is to invert the mask and use boolean indexing.

python
clean = arr[~np.isnan(arr)]
print(clean)
# [1. 2. 4. 6.]

The tilde operator (~) negates each boolean, so you keep only the elements that are not NaN. This creates a new array; the original is unchanged.

For a 2-D array you can remove entire rows or columns that contain at least one NaN.

python
1matrix = np.array([
2    [1.0, 2.0, np.nan],
3    [4.0, 5.0, 6.0],
4    [np.nan, 8.0, 9.0]
5])
6
7# Remove rows containing any NaN
8row_mask = ~np.any(np.isnan(matrix), axis=1)
9clean_rows = matrix[row_mask]
10print(clean_rows)
11# [[4. 5. 6.]]
12
13# Remove columns containing any NaN
14col_mask = ~np.any(np.isnan(matrix), axis=0)
15clean_cols = matrix[:, col_mask]
16print(clean_cols)
17# [[2.]
18#  [5.]
19#  [8.]]

Replacing NaN with np.nan_to_num

Sometimes you want to keep the array shape intact rather than shrink it. np.nan_to_num() replaces NaN with a value you choose (default is 0.0).

python
1filled = np.nan_to_num(arr, nan=0.0)
2print(filled)
3# [1. 2. 0. 4. 0. 6.]
4
5# Replace with the mean of valid values
6mean_val = np.nanmean(arr)
7filled_mean = np.nan_to_num(arr, nan=mean_val)
8print(filled_mean)
9# [1.    2.    3.25  4.    3.25  6.  ]

This approach is useful when downstream code expects a fixed-length array, such as feeding features into a machine learning model.

NaN-Safe Aggregation Functions

NumPy ships with aggregation functions that skip NaN entries automatically. These let you compute statistics without removing values first.

python
1data = np.array([10.0, np.nan, 30.0, np.nan, 50.0])
2
3print(np.nanmean(data))   # 30.0
4print(np.nansum(data))    # 90.0
5print(np.nanstd(data))    # 16.32993161855452
6print(np.nanmax(data))    # 50.0
7print(np.nanmin(data))    # 10.0

Compare this to the regular np.mean(data), which returns nan because any arithmetic involving NaN propagates NaN.

Pandas Integration for 2-D Data

When your data lives in a table-like structure, pandas offers higher-level methods that build on NumPy under the hood.

python
1import pandas as pd
2
3df = pd.DataFrame({
4    "A": [1.0, np.nan, 3.0],
5    "B": [np.nan, 5.0, 6.0],
6    "C": [7.0, 8.0, 9.0]
7})
8
9# Drop rows with any NaN
10print(df.dropna())
11#      A    B    C
12# 2  3.0  6.0  9.0
13
14# Fill NaN with column means
15print(df.fillna(df.mean()))
16#      A    B    C
17# 0  1.0  5.5  7.0
18# 1  2.0  5.0  8.0
19# 2  3.0  6.0  9.0

You can convert a cleaned DataFrame back to a NumPy array with df.to_numpy() whenever you need raw array performance again.

Common Pitfalls

  • Comparing NaN with ==: The expression np.nan == np.nan evaluates to False. Always use np.isnan() for detection.
  • Integer arrays cannot hold NaN: NaN is a float concept. If you create an integer array and insert NaN, NumPy silently upcasts to float, which may change your data semantics.
  • In-place mutation confusion: Boolean indexing returns a copy, not a view. Assigning to the result does not modify the original array.
  • Forgetting the axis parameter: When using np.any(np.isnan(matrix), axis=...), omitting axis collapses the entire array into a single boolean, which is not what you want for row-wise or column-wise filtering.
  • Mixing NaN-safe and regular functions: Calling np.sum() instead of np.nansum() on data with missing values silently returns nan, which can propagate undetected through a long computation pipeline.

Summary

  • Use np.isnan() combined with boolean indexing (arr[~np.isnan(arr)]) to remove NaN from 1-D arrays.
  • For 2-D arrays, apply np.any() along the appropriate axis to build row or column masks.
  • Use np.nan_to_num() when you need to preserve array shape by substituting a replacement value.
  • Prefer np.nanmean(), np.nansum(), and related functions for aggregation over arrays that may contain NaN.
  • For tabular data, pandas dropna() and fillna() provide convenient, expressive alternatives that interoperate seamlessly with NumPy.

Course illustration
Course illustration

All Rights Reserved.