Arrays of Int arrays. Storing duplicates in order only

Arrays

Int Arrays

Duplicates

Data Structures

Programming

Arrays of Int arrays. Storing duplicates in order only

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Keeping only duplicate integer arrays while preserving their original order sounds simple, but raw arrays make it awkward. In languages such as Java, arrays compare by reference, not by contents, so two separate arrays with the same numbers are not automatically treated as duplicates.

Define What "Duplicate" Means

Before choosing an algorithm, decide which of these behaviors you want:

keep every row whose contents appear more than once
keep only the second and later appearances
keep one representative of each duplicated value

Those are different outputs. The data structure should follow the rule, not the other way around.

In Java, the first issue is equality. int[] does not have value-based equals() and hashCode(), so a HashSet<int[]> will only detect the same array object, not another array with the same elements.

java

1import java.util.Arrays;
2
3public class EqualityDemo {
4    public static void main(String[] args) {
5        int[] a = {1, 2};
6        int[] b = {1, 2};
7
8        System.out.println(a == b);
9        System.out.println(Arrays.equals(a, b));
10    }
11}

The first check is false because a and b are different objects. The second check is true because it compares contents.

Build a Stable Key for Each Row

To detect content duplicates, convert each row into a key type with useful equality. A simple approach is to turn int[] into a boxed List<Integer>.

java

1import java.util.Arrays;
2import java.util.List;
3import java.util.stream.Collectors;
4
5public class Keys {
6    static List<Integer> keyOf(int[] row) {
7        return Arrays.stream(row)
8                .boxed()
9                .collect(Collectors.toList());
10    }
11}

That key can be used in a Map or Set because List implements value-based equality. It is not the only option, but it is readable and correct for most everyday inputs.

Keep All Duplicated Rows in Original Order

If you want to keep every row whose contents occur more than once, use a two-pass algorithm:

count how many times each key appears
walk the rows again and keep those with a count greater than one

java

1import java.util.*;
2import java.util.stream.Collectors;
3import java.util.Arrays;
4
5public class DuplicateRowsInOrder {
6    static List<Integer> keyOf(int[] row) {
7        return Arrays.stream(row)
8                .boxed()
9                .collect(Collectors.toList());
10    }
11
12    static List<int[]> keepAllDuplicates(int[][] rows) {
13        Map<List<Integer>, Integer> counts = new HashMap<>();
14
15        for (int[] row : rows) {
16            counts.merge(keyOf(row), 1, Integer::sum);
17        }
18
19        List<int[]> result = new ArrayList<>();
20        for (int[] row : rows) {
21            if (counts.get(keyOf(row)) > 1) {
22                result.add(row);
23            }
24        }
25        return result;
26    }
27
28    public static void main(String[] args) {
29        int[][] rows = {
30            {1, 2},
31            {3, 4},
32            {1, 2},
33            {5, 6},
34            {3, 4}
35        };
36
37        for (int[] row : keepAllDuplicates(rows)) {
38            System.out.println(Arrays.toString(row));
39        }
40    }
41}

This preserves the original order because the second pass iterates through the source data exactly once, in sequence.

Keep Only Later Duplicate Occurrences

Sometimes the desired output is "show me repeated rows after the first sighting". That is a different rule and works with a one-pass scan plus a seen set.

java

1import java.util.*;
2import java.util.stream.Collectors;
3import java.util.Arrays;
4
5public class LaterDuplicates {
6    static List<Integer> keyOf(int[] row) {
7        return Arrays.stream(row)
8                .boxed()
9                .collect(Collectors.toList());
10    }
11
12    static List<int[]> laterDuplicatesOnly(int[][] rows) {
13        Set<List<Integer>> seen = new HashSet<>();
14        List<int[]> result = new ArrayList<>();
15
16        for (int[] row : rows) {
17            List<Integer> key = keyOf(row);
18            if (!seen.add(key)) {
19                result.add(row);
20            }
21        }
22        return result;
23    }
24}

With this version, the first appearance is not returned. Only repeats are.

Choosing a Better Key for Large Inputs

Boxing integers into List<Integer> is easy to understand, but it creates extra objects. If performance becomes a real issue, there are other choices:

a small wrapper class around int[] with custom equals() and hashCode()
a normalized string key such as Arrays.toString(row)
a primitive-oriented collection library

Do not optimize too early. The most common bug in this problem is not speed. It is using reference equality by accident and silently getting the wrong answer.

Common Pitfalls

Using HashSet<int[]> and expecting arrays with the same values to match.
Forgetting to define whether the first occurrence of a duplicate should be kept.
Grouping rows and then losing the original order.
Recomputing keys inconsistently between counting and filtering steps.
Choosing a compact custom key without testing collisions or correctness.

Summary

Raw arrays usually need content-based comparison, not reference comparison.
Convert each int[] into a stable key before using a map or set.
Use a two-pass count-and-filter approach when you want all duplicated rows in order.
Use a seen-set approach when you want only later duplicate occurrences.
Make the duplicate rule explicit before writing the algorithm.