Find unique rows in numpy.array
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Duplicate rows show up constantly in data cleaning, feature engineering, and preprocessing pipelines. NumPy can remove them efficiently, but the exact approach depends on whether you only need the unique rows, or whether you also need original indexes, counts, or stable row order.
Use np.unique With axis=0
For a two-dimensional array, the standard solution is np.unique(..., axis=0). Setting axis=0 tells NumPy to treat each row as one value for uniqueness checks.
Output:
This is the cleanest solution for most modern NumPy code.
Get the Original Row Indexes
Sometimes you need more than the deduplicated array. For example, you may want to know where each unique row first appeared in the original dataset.
return_index=True gives the first position of each unique row in the source array. That is useful when you need to map deduplicated results back to the original data.
Preserve the Original Appearance Order
One detail surprises many people: np.unique sorts the result. If you want unique rows in the order they first appeared, sort the returned indexes and reselect from the original array.
That pattern is common in data pipelines where row order carries meaning, such as logs, time-adjacent features, or user-facing exports.
Count Duplicate Rows
If you want to know how often each row appears, ask NumPy for counts at the same time.
This is useful for spotting repeated measurements, duplicate records, or dominant categories already encoded as rows.
Older Workarounds and Structured Views
Before axis=0 support became the obvious answer, developers often converted each row into a structured dtype or a byte view so np.unique could treat the full row as one comparable element. Those tricks still appear in old codebases, but they are harder to read and easier to misuse.
For normal two-dimensional numeric arrays, prefer the direct axis=0 form. Reach for structured views only if you are dealing with unusual compatibility constraints or specialized memory layouts.
That preference matters in teams because the straightforward form is easier to review and much less surprising to someone maintaining the code later.
Common Pitfalls
- Forgetting
axis=0makesnp.uniqueoperate on individual elements instead of rows. - Expecting the result to preserve row order is a mistake because
np.uniquesorts by default. - Applying
np.uniqueto floating-point data can treat rows as different when tiny numeric noise is present. - Mixing object arrays with row deduplication can be slower and less predictable than working with numeric arrays.
- Assuming
return_indexgives every occurrence is wrong; it returns only the first occurrence of each unique row.
Summary
- Use
np.unique(array, axis=0)to get unique rows from a two-dimensional NumPy array. - Add
return_index=Trueorreturn_counts=Truewhen you need row positions or frequencies. - Reorder by the first indexes if you want first-seen order instead of sorted order.
- Be careful with float noise and object arrays when deduplicating rows.

