Arrays of Int arrays. Storing duplicates in order only
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Keeping only duplicate integer arrays while preserving their original order sounds simple, but raw arrays make it awkward. In languages such as Java, arrays compare by reference, not by contents, so two separate arrays with the same numbers are not automatically treated as duplicates.
Define What "Duplicate" Means
Before choosing an algorithm, decide which of these behaviors you want:
- keep every row whose contents appear more than once
- keep only the second and later appearances
- keep one representative of each duplicated value
Those are different outputs. The data structure should follow the rule, not the other way around.
In Java, the first issue is equality. int[] does not have value-based equals() and hashCode(), so a HashSet<int[]> will only detect the same array object, not another array with the same elements.
The first check is false because a and b are different objects. The second check is true because it compares contents.
Build a Stable Key for Each Row
To detect content duplicates, convert each row into a key type with useful equality. A simple approach is to turn int[] into a boxed List<Integer>.
That key can be used in a Map or Set because List implements value-based equality. It is not the only option, but it is readable and correct for most everyday inputs.
Keep All Duplicated Rows in Original Order
If you want to keep every row whose contents occur more than once, use a two-pass algorithm:
- count how many times each key appears
- walk the rows again and keep those with a count greater than one
This preserves the original order because the second pass iterates through the source data exactly once, in sequence.
Keep Only Later Duplicate Occurrences
Sometimes the desired output is "show me repeated rows after the first sighting". That is a different rule and works with a one-pass scan plus a seen set.
With this version, the first appearance is not returned. Only repeats are.
Choosing a Better Key for Large Inputs
Boxing integers into List<Integer> is easy to understand, but it creates extra objects. If performance becomes a real issue, there are other choices:
- a small wrapper class around
int[]with customequals()andhashCode() - a normalized string key such as
Arrays.toString(row) - a primitive-oriented collection library
Do not optimize too early. The most common bug in this problem is not speed. It is using reference equality by accident and silently getting the wrong answer.
Common Pitfalls
- Using
HashSet<int[]>and expecting arrays with the same values to match. - Forgetting to define whether the first occurrence of a duplicate should be kept.
- Grouping rows and then losing the original order.
- Recomputing keys inconsistently between counting and filtering steps.
- Choosing a compact custom key without testing collisions or correctness.
Summary
- Raw arrays usually need content-based comparison, not reference comparison.
- Convert each
int[]into a stable key before using a map or set. - Use a two-pass count-and-filter approach when you want all duplicated rows in order.
- Use a seen-set approach when you want only later duplicate occurrences.
- Make the duplicate rule explicit before writing the algorithm.

