Numpy
Python
Unique Rows
Data Analysis
Arrays

Find unique rows in numpy.array

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Duplicate rows show up constantly in data cleaning, feature engineering, and preprocessing pipelines. NumPy can remove them efficiently, but the exact approach depends on whether you only need the unique rows, or whether you also need original indexes, counts, or stable row order.

Use np.unique With axis=0

For a two-dimensional array, the standard solution is np.unique(..., axis=0). Setting axis=0 tells NumPy to treat each row as one value for uniqueness checks.

python
1import numpy as np
2
3data = np.array([
4    [1, 2],
5    [3, 4],
6    [1, 2],
7    [5, 6],
8    [3, 4],
9])
10
11unique_rows = np.unique(data, axis=0)
12print(unique_rows)

Output:

python
[[1 2]
 [3 4]
 [5 6]]

This is the cleanest solution for most modern NumPy code.

Get the Original Row Indexes

Sometimes you need more than the deduplicated array. For example, you may want to know where each unique row first appeared in the original dataset.

python
1import numpy as np
2
3data = np.array([
4    [7, 1],
5    [2, 9],
6    [7, 1],
7    [4, 3],
8])
9
10unique_rows, first_indexes = np.unique(data, axis=0, return_index=True)
11
12print(unique_rows)
13print(first_indexes)

return_index=True gives the first position of each unique row in the source array. That is useful when you need to map deduplicated results back to the original data.

Preserve the Original Appearance Order

One detail surprises many people: np.unique sorts the result. If you want unique rows in the order they first appeared, sort the returned indexes and reselect from the original array.

python
1import numpy as np
2
3data = np.array([
4    [3, 4],
5    [1, 2],
6    [3, 4],
7    [5, 6],
8])
9
10_, first_indexes = np.unique(data, axis=0, return_index=True)
11ordered_unique_rows = data[np.sort(first_indexes)]
12
13print(ordered_unique_rows)

That pattern is common in data pipelines where row order carries meaning, such as logs, time-adjacent features, or user-facing exports.

Count Duplicate Rows

If you want to know how often each row appears, ask NumPy for counts at the same time.

python
1import numpy as np
2
3data = np.array([
4    [1, 2],
5    [1, 2],
6    [3, 4],
7    [3, 4],
8    [3, 4],
9])
10
11unique_rows, counts = np.unique(data, axis=0, return_counts=True)
12
13print(unique_rows)
14print(counts)

This is useful for spotting repeated measurements, duplicate records, or dominant categories already encoded as rows.

Older Workarounds and Structured Views

Before axis=0 support became the obvious answer, developers often converted each row into a structured dtype or a byte view so np.unique could treat the full row as one comparable element. Those tricks still appear in old codebases, but they are harder to read and easier to misuse.

For normal two-dimensional numeric arrays, prefer the direct axis=0 form. Reach for structured views only if you are dealing with unusual compatibility constraints or specialized memory layouts.

That preference matters in teams because the straightforward form is easier to review and much less surprising to someone maintaining the code later.

Common Pitfalls

  • Forgetting axis=0 makes np.unique operate on individual elements instead of rows.
  • Expecting the result to preserve row order is a mistake because np.unique sorts by default.
  • Applying np.unique to floating-point data can treat rows as different when tiny numeric noise is present.
  • Mixing object arrays with row deduplication can be slower and less predictable than working with numeric arrays.
  • Assuming return_index gives every occurrence is wrong; it returns only the first occurrence of each unique row.

Summary

  • Use np.unique(array, axis=0) to get unique rows from a two-dimensional NumPy array.
  • Add return_index=True or return_counts=True when you need row positions or frequencies.
  • Reorder by the first indexes if you want first-seen order instead of sorted order.
  • Be careful with float noise and object arrays when deduplicating rows.

Course illustration
Course illustration

All Rights Reserved.