Finding the first duplicate in an int array, java
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding the first duplicate in an integer array is a common interview and production data-validation task, but the exact definition of first duplicate matters. Most formulations mean the value whose second occurrence appears earliest while scanning from left to right. Choosing the right algorithm depends on constraints such as input range, memory limits, and whether array mutation is allowed.
Clarify the Requirement First
Given array [2, 1, 3, 5, 3, 2], first duplicate is 3 because second 3 appears before second 2.
This definition differs from:
- smallest duplicated value
- first repeated by sorted order
- first value that appears twice overall regardless of position
Write tests around definition before implementing.
HashSet Approach: Fast and Clear
For general-purpose Java code, HashSet is the most practical solution.
Complexity:
- time around
O(n)average - space around
O(n)in worst case
This is usually the best tradeoff for readability and speed.
Brute Force Approach: Simple but Slow
Brute force checks every pair and is often too slow for large arrays.
Use this only for tiny inputs or teaching.
In-Place Marker Technique for Restricted Domains
If values are guaranteed in range 1..n and array mutation is allowed, you can detect duplicates in O(n) time with O(1) extra space by flipping sign at indexed positions.
This is efficient but less readable and modifies input. It is appropriate only when constraints are explicit.
Boolean Array Technique for Bounded Values
If values are non-negative and bounded by known max, a boolean presence table can outperform hashing.
This is fast but memory depends on value range, not array length.
Production Considerations
In real services, duplicate detection often runs on untrusted input. Add safeguards:
- null and empty array handling
- documented sentinel return value or exception strategy
- clear behavior for negative values and large domains
- logging context when duplicates signal data quality issues
Prefer explicit contracts over implicit assumptions.
Testing for Correctness
Useful test cases:
- no duplicates
- duplicate at beginning
- duplicate at end
- multiple duplicated values where second occurrence order decides output
- single element and empty arrays
Example expected behavior table:
- '
[1, 2, 3]returns-1' - '
[1, 1, 2]returns1' - '
[2, 1, 3, 5, 3, 2]returns3'
These tests lock requirement semantics.
Common Pitfalls
- Misinterpreting first duplicate definition and returning smallest duplicated value.
- Using in-place sign trick without value range validation.
- Mutating input array unexpectedly in shared code paths.
- Assuming hash-based approach preserves insertion order semantics for result logic.
- Forgetting to define behavior for no-duplicate case.
Summary
- '
HashSetscanning is the best default for first duplicate detection in Java.' - In-place and boolean-table methods are useful under specific constraints.
- Requirement definition must be explicit before coding.
- Input validation and tests are essential for reliable behavior.
- Pick algorithm based on data domain, memory budget, and mutation rules.

