Java
int array
duplicate detection
algorithm
programming tutorial

Finding the first duplicate in an int array, java

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding the first duplicate in an integer array is a common interview and production data-validation task, but the exact definition of first duplicate matters. Most formulations mean the value whose second occurrence appears earliest while scanning from left to right. Choosing the right algorithm depends on constraints such as input range, memory limits, and whether array mutation is allowed.

Clarify the Requirement First

Given array [2, 1, 3, 5, 3, 2], first duplicate is 3 because second 3 appears before second 2.

This definition differs from:

  • smallest duplicated value
  • first repeated by sorted order
  • first value that appears twice overall regardless of position

Write tests around definition before implementing.

HashSet Approach: Fast and Clear

For general-purpose Java code, HashSet is the most practical solution.

java
1import java.util.HashSet;
2import java.util.Set;
3
4public class FirstDuplicate {
5    public static int firstDuplicate(int[] arr) {
6        Set<Integer> seen = new HashSet<>();
7
8        for (int value : arr) {
9            if (!seen.add(value)) {
10                return value;
11            }
12        }
13
14        return -1;
15    }
16
17    public static void main(String[] args) {
18        int[] data = {2, 1, 3, 5, 3, 2};
19        System.out.println(firstDuplicate(data)); // 3
20    }
21}

Complexity:

  • time around O(n) average
  • space around O(n) in worst case

This is usually the best tradeoff for readability and speed.

Brute Force Approach: Simple but Slow

Brute force checks every pair and is often too slow for large arrays.

java
1public static int firstDuplicateBruteForce(int[] arr) {
2    int bestSecondIndex = Integer.MAX_VALUE;
3    int bestValue = -1;
4
5    for (int i = 0; i < arr.length; i++) {
6        for (int j = i + 1; j < arr.length; j++) {
7            if (arr[i] == arr[j] && j < bestSecondIndex) {
8                bestSecondIndex = j;
9                bestValue = arr[i];
10            }
11        }
12    }
13
14    return bestValue;
15}

Use this only for tiny inputs or teaching.

In-Place Marker Technique for Restricted Domains

If values are guaranteed in range 1..n and array mutation is allowed, you can detect duplicates in O(n) time with O(1) extra space by flipping sign at indexed positions.

java
1public static int firstDuplicateInPlace(int[] arr) {
2    for (int i = 0; i < arr.length; i++) {
3        int idx = Math.abs(arr[i]) - 1;
4
5        if (idx < 0 || idx >= arr.length) {
6            throw new IllegalArgumentException("Values must be in 1..n");
7        }
8
9        if (arr[idx] < 0) {
10            return Math.abs(arr[i]);
11        }
12
13        arr[idx] = -arr[idx];
14    }
15
16    return -1;
17}

This is efficient but less readable and modifies input. It is appropriate only when constraints are explicit.

Boolean Array Technique for Bounded Values

If values are non-negative and bounded by known max, a boolean presence table can outperform hashing.

java
1public static int firstDuplicateBounded(int[] arr, int maxValue) {
2    boolean[] seen = new boolean[maxValue + 1];
3
4    for (int value : arr) {
5        if (value < 0 || value > maxValue) {
6            throw new IllegalArgumentException("Value out of range");
7        }
8
9        if (seen[value]) {
10            return value;
11        }
12
13        seen[value] = true;
14    }
15
16    return -1;
17}

This is fast but memory depends on value range, not array length.

Production Considerations

In real services, duplicate detection often runs on untrusted input. Add safeguards:

  • null and empty array handling
  • documented sentinel return value or exception strategy
  • clear behavior for negative values and large domains
  • logging context when duplicates signal data quality issues

Prefer explicit contracts over implicit assumptions.

Testing for Correctness

Useful test cases:

  • no duplicates
  • duplicate at beginning
  • duplicate at end
  • multiple duplicated values where second occurrence order decides output
  • single element and empty arrays

Example expected behavior table:

  • '[1, 2, 3] returns -1'
  • '[1, 1, 2] returns 1'
  • '[2, 1, 3, 5, 3, 2] returns 3'

These tests lock requirement semantics.

Common Pitfalls

  • Misinterpreting first duplicate definition and returning smallest duplicated value.
  • Using in-place sign trick without value range validation.
  • Mutating input array unexpectedly in shared code paths.
  • Assuming hash-based approach preserves insertion order semantics for result logic.
  • Forgetting to define behavior for no-duplicate case.

Summary

  • 'HashSet scanning is the best default for first duplicate detection in Java.'
  • In-place and boolean-table methods are useful under specific constraints.
  • Requirement definition must be explicit before coding.
  • Input validation and tests are essential for reliable behavior.
  • Pick algorithm based on data domain, memory budget, and mutation rules.

Course illustration
Course illustration

All Rights Reserved.