Algorithm to tell if two arrays have identical members

arrays

algorithms

data comparison

coding techniques

programming

Algorithm to tell if two arrays have identical members

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

To decide whether two arrays have identical members, you first need to define what "identical" means. In most programming discussions, it means the arrays contain the same values with the same multiplicities, regardless of order. In other words, you are comparing multisets, not sequences.

The Fastest General-Purpose Approach: Count Frequencies

If order does not matter and duplicates do matter, a frequency table is usually the best algorithm. Count how many times each value appears in the first array, subtract counts using the second array, and verify that everything balances back to zero.

python

1from collections import Counter
2
3
4def same_members(a, b):
5    if len(a) != len(b):
6        return False
7    return Counter(a) == Counter(b)
8
9
10print(same_members([1, 2, 2, 3], [3, 2, 1, 2]))
11print(same_members([1, 2, 2], [1, 2, 3]))

Output:

text

True
False

This runs in linear time on average and preserves the important rule that repeated elements count.

Sorting Is Simpler but Usually Slower

A second common solution is to sort both arrays and compare them element by element.

python

1def same_members_sorted(a, b):
2    if len(a) != len(b):
3        return False
4    return sorted(a) == sorted(b)

This is easy to read and often perfectly acceptable for modest input sizes. The tradeoff is complexity: sorting costs O(n log n) rather than the average O(n) of a hash-based counting approach.

Sorting can still be a good choice when:

the element type is easy to sort
memory is tight and in-place sorting is acceptable
code simplicity matters more than absolute performance

Why a Set Is Not Enough

A common mistake is to compare set(a) == set(b). That only checks unique members and ignores how many times each value appears.

python

print(set([1, 1, 2]) == set([1, 2, 2]))

Output:

text

True

That answer is wrong for multiset comparison because the duplicate counts are different.

A Manual Counting Version

If you want to see the algorithm without relying on Counter, here is a manual implementation:

python

1def same_members_manual(a, b):
2    if len(a) != len(b):
3        return False
4
5    counts = {}
6
7    for value in a:
8        counts[value] = counts.get(value, 0) + 1
9
10    for value in b:
11        if value not in counts:
12            return False
13        counts[value] -= 1
14        if counts[value] < 0:
15            return False
16
17    return all(count == 0 for count in counts.values())

This is the same underlying idea as Counter, just written out explicitly.

Special Cases to Think About

If order matters, the problem becomes much simpler: just compare the arrays directly. If duplicates do not matter, then set comparison is appropriate. If the arrays can contain unhashable values such as nested lists, the frequency-table approach needs adaptation because those elements cannot be used as dictionary keys directly.

So the best algorithm depends on the exact definition of equality. For the usual "same members including repeats" definition, counting wins.

Common Pitfalls

The first pitfall is skipping the length check. If the arrays have different lengths, they cannot have identical members with matching multiplicities.

Another mistake is using sets and accidentally throwing away duplicate information. That solves a different problem.

Sorting can also mislead people when the elements are expensive to compare or not naturally sortable. A counting approach may still work fine there if the values are hashable.

Finally, be careful with mixed data types. In some languages or datasets, values that look similar may compare or hash differently, so make sure the equality semantics match what the application expects.

Summary

For multiset comparison, use a frequency table or Counter.
This approach is usually O(n) on average and handles duplicates correctly.
Sorting both arrays also works but is typically O(n log n).
Set comparison is wrong when duplicate counts matter.
Define clearly whether order and multiplicity are part of "identical members" before choosing the algorithm.