Check stdvector has duplicates
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Checking whether a std::vector contains duplicates is a common problem, but the best solution depends on whether you care only about existence, also need the duplicate values, or must preserve the original order. In practice, the main tradeoff is between sorting in place and using an auxiliary hash set.
Decide what constraints matter
Before writing code, decide which of these conditions apply:
- you only need a yes or no answer
- you must not modify the original vector
- element order matters later
- the type has a hash function
- memory usage matters more than raw speed
Those constraints determine the implementation more than the container itself.
Fast check with std::unordered_set
If the element type is hashable, an unordered_set gives a direct and readable solution. As you iterate through the vector, insert each value into the set. If insertion fails, you found a duplicate.
This is usually the best default when average-case speed matters and extra memory is acceptable.
Sort and scan when hashing is not ideal
If you can mutate a copy or do not have a hash function, sorting is a good alternative. After sorting, equal values become adjacent, so duplicate detection becomes a linear scan.
Passing by value keeps the caller's vector unchanged. If copying is expensive and mutation is allowed, pass by non-const reference instead.
When to use a quadratic scan
For very small vectors, a nested loop is sometimes acceptable. It avoids extra memory and does not require hashing or sorting support.
This is usually only reasonable when the vector is tiny or the code is in a constrained environment where allocation is undesirable.
Custom types need equality and sometimes hashing
For user-defined types, unordered_set requires equality and hashing. A minimal example:
If equality should depend only on id, encode that consistently in both operator== and the hash function.
Returning the duplicate values
Sometimes you need more than a boolean. You may want the repeated items for logging or validation feedback.
That pattern avoids reporting the same duplicate more than once.
Common Pitfalls
The most common mistake is sorting the original vector without realizing later code depends on the original order. Another is using unordered_set for a custom type without defining matching equality and hash behavior. Developers also sometimes benchmark only the happy path and ignore how duplicate detection behaves with large inputs or expensive copies. A subtler issue is returning only a boolean when the caller really needs to know which values were duplicated, which leads to a second pass and redundant work. Finally, quadratic scans are often kept in place after prototypes grow into production code, where they become unexpectedly slow.
Summary
- Use
unordered_setwhen you want a fast existence check and extra memory is acceptable. - Use sort plus
std::adjacent_findwhen hashing is unavailable or mutation of a copy is fine. - Use a nested loop only for tiny vectors or highly constrained situations.
- Define equality and hashing carefully for custom element types.
- Decide early whether you need only a boolean or also the duplicate values themselves.
- Pick the approach that matches your constraints, not just the shortest snippet.

