Algorithm to find duplicate in an array
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding duplicates in an array is a common interview problem and a common production problem, but the right algorithm depends on what “find duplicate” actually means. You might only need to know whether any duplicate exists, you might need the repeated values themselves, or you might be under special constraints such as constant extra space. The best solution changes with those requirements.
In ordinary application code, a hash set is usually the cleanest answer. Under stricter interview-style constraints, sorting or Floyd's cycle-detection trick can also be appropriate.
Use a Set for the Clearest General Solution
If you want to detect duplicates in an arbitrary array, a set gives you a fast and easy approach. Walk through the array, remember what you have seen, and report anything that appears twice.
This solution is usually O(n) time with O(n) extra space. That is a very good tradeoff for most real programs because the code is short, readable, and correct.
If you only need a yes-or-no answer, you can return early instead of collecting all duplicates.
Sort First When Extra Memory Must Stay Small
If you want to reduce extra memory, sorting the array lets you detect duplicates by comparing neighbors.
This usually runs in O(n log n) time because sorting dominates the cost. The advantage is that you do not need a hash-based memory structure for every value.
Be careful with in-place sorting. If the original order matters, sort a copy instead of mutating the caller's array.
Use Floyd's Algorithm Only Under Specific Constraints
Some versions of the problem say the array has length n + 1, every value is in the range 1..n, and there is guaranteed to be at least one duplicate. Under those exact constraints, you can model the array as a linked structure and use Floyd's cycle detection.
This is clever and uses O(1) extra space, but it only works because the input follows strict rules. It is not a general duplicate-finding method for arbitrary arrays.
Choose the Algorithm by the Actual Question
Before you code, answer these questions:
- do you need one duplicate or all duplicates
- may the input be modified
- is extra memory allowed
- are the values constrained to a special range
Most confusion disappears once those requirements are explicit. Developers often over-engineer the problem because they reach for a clever algorithm before understanding the input contract.
Common Pitfalls
The most common mistake is using Floyd's algorithm on a general array where the required value-range guarantees do not hold. In that case, the reasoning breaks.
Another issue is sorting the input in place when other code still depends on the original order.
A third problem is collecting duplicates in a list without deduplicating the duplicates themselves. An item that repeats three times should usually still be reported once.
Summary
- A hash set is the clearest general-purpose solution for duplicate detection.
- Sorting is a good alternative when you want lower auxiliary memory and can accept
O(n log n)time. - Floyd's cycle detection is powerful but only for special constrained inputs.
- Decide whether you need existence, one duplicate, or all duplicates before choosing the algorithm.
- Be explicit about whether the input array may be modified.

