Check stdvector has duplicates

C++

std::vector

check duplicates

data structures

programming tips

Check stdvector has duplicates

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Checking whether a std::vector contains duplicates is a common problem, but the best solution depends on whether you care only about existence, also need the duplicate values, or must preserve the original order. In practice, the main tradeoff is between sorting in place and using an auxiliary hash set.

Decide what constraints matter

Before writing code, decide which of these conditions apply:

you only need a yes or no answer
you must not modify the original vector
element order matters later
the type has a hash function
memory usage matters more than raw speed

Those constraints determine the implementation more than the container itself.

Fast check with `std::unordered_set`

If the element type is hashable, an unordered_set gives a direct and readable solution. As you iterate through the vector, insert each value into the set. If insertion fails, you found a duplicate.

cpp

1#include <iostream>
2#include <unordered_set>
3#include <vector>
4
5bool has_duplicates(const std::vector<int>& values) {
6    std::unordered_set<int> seen;
7
8    for (int value : values) {
9        auto [it, inserted] = seen.insert(value);
10        if (!inserted) {
11            return true;
12        }
13    }
14
15    return false;
16}
17
18int main() {
19    std::vector<int> a{1, 2, 3, 4};
20    std::vector<int> b{1, 2, 3, 2};
21
22    std::cout << std::boolalpha;
23    std::cout << has_duplicates(a) << '\n';
24    std::cout << has_duplicates(b) << '\n';
25}

This is usually the best default when average-case speed matters and extra memory is acceptable.

Sort and scan when hashing is not ideal

If you can mutate a copy or do not have a hash function, sorting is a good alternative. After sorting, equal values become adjacent, so duplicate detection becomes a linear scan.

cpp

1#include <algorithm>
2#include <iostream>
3#include <vector>
4
5bool has_duplicates_sorted(std::vector<int> values) {
6    std::sort(values.begin(), values.end());
7    return std::adjacent_find(values.begin(), values.end()) != values.end();
8}
9
10int main() {
11    std::vector<int> values{7, 3, 9, 7, 1};
12    std::cout << std::boolalpha << has_duplicates_sorted(values) << '\n';
13}

Passing by value keeps the caller's vector unchanged. If copying is expensive and mutation is allowed, pass by non-const reference instead.

When to use a quadratic scan

For very small vectors, a nested loop is sometimes acceptable. It avoids extra memory and does not require hashing or sorting support.

cpp

1#include <iostream>
2#include <vector>
3
4bool has_duplicates_bruteforce(const std::vector<int>& values) {
5    for (std::size_t i = 0; i < values.size(); ++i) {
6        for (std::size_t j = i + 1; j < values.size(); ++j) {
7            if (values[i] == values[j]) {
8                return true;
9            }
10        }
11    }
12    return false;
13}
14
15int main() {
16    std::vector<int> values{4, 8, 15, 16, 23, 42};
17    std::cout << std::boolalpha << has_duplicates_bruteforce(values) << '\n';
18}

This is usually only reasonable when the vector is tiny or the code is in a constrained environment where allocation is undesirable.

Custom types need equality and sometimes hashing

For user-defined types, unordered_set requires equality and hashing. A minimal example:

cpp

1#include <iostream>
2#include <string>
3#include <unordered_set>
4#include <vector>
5
6struct User {
7    int id;
8    std::string name;
9
10    bool operator==(const User& other) const {
11        return id == other.id && name == other.name;
12    }
13};
14
15struct UserHash {
16    std::size_t operator()(const User& user) const {
17        return std::hash<int>{}(user.id) ^ (std::hash<std::string>{}(user.name) << 1);
18    }
19};
20
21bool has_duplicates(const std::vector<User>& users) {
22    std::unordered_set<User, UserHash> seen;
23    for (const auto& user : users) {
24        if (!seen.insert(user).second) {
25            return true;
26        }
27    }
28    return false;
29}
30
31int main() {
32    std::vector<User> users{{1, "Ada"}, {2, "Linus"}, {1, "Ada"}};
33    std::cout << std::boolalpha << has_duplicates(users) << '\n';
34}

If equality should depend only on id, encode that consistently in both operator== and the hash function.

Returning the duplicate values

Sometimes you need more than a boolean. You may want the repeated items for logging or validation feedback.

cpp

1#include <iostream>
2#include <unordered_set>
3#include <vector>
4
5std::vector<int> duplicate_values(const std::vector<int>& values) {
6    std::unordered_set<int> seen;
7    std::unordered_set<int> duplicates;
8
9    for (int value : values) {
10        if (!seen.insert(value).second) {
11            duplicates.insert(value);
12        }
13    }
14
15    return std::vector<int>(duplicates.begin(), duplicates.end());
16}

That pattern avoids reporting the same duplicate more than once.

Common Pitfalls

The most common mistake is sorting the original vector without realizing later code depends on the original order. Another is using unordered_set for a custom type without defining matching equality and hash behavior. Developers also sometimes benchmark only the happy path and ignore how duplicate detection behaves with large inputs or expensive copies. A subtler issue is returning only a boolean when the caller really needs to know which values were duplicated, which leads to a second pass and redundant work. Finally, quadratic scans are often kept in place after prototypes grow into production code, where they become unexpectedly slow.

Summary

Use unordered_set when you want a fast existence check and extra memory is acceptable.
Use sort plus std::adjacent_find when hashing is unavailable or mutation of a copy is fine.
Use a nested loop only for tiny vectors or highly constrained situations.
Define equality and hashing carefully for custom element types.
Decide early whether you need only a boolean or also the duplicate values themselves.
Pick the approach that matches your constraints, not just the shortest snippet.

Check stdvector has duplicates

Master System Design with Codemia

Introduction

Decide what constraints matter

Fast check with std::unordered_set

Sort and scan when hashing is not ideal

When to use a quadratic scan

Custom types need equality and sometimes hashing

Returning the duplicate values

Common Pitfalls

Summary

Fast check with `std::unordered_set`