Algorithm to tell if two arrays have identical members
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
To decide whether two arrays have identical members, you first need to define what "identical" means. In most programming discussions, it means the arrays contain the same values with the same multiplicities, regardless of order. In other words, you are comparing multisets, not sequences.
The Fastest General-Purpose Approach: Count Frequencies
If order does not matter and duplicates do matter, a frequency table is usually the best algorithm. Count how many times each value appears in the first array, subtract counts using the second array, and verify that everything balances back to zero.
Output:
This runs in linear time on average and preserves the important rule that repeated elements count.
Sorting Is Simpler but Usually Slower
A second common solution is to sort both arrays and compare them element by element.
This is easy to read and often perfectly acceptable for modest input sizes. The tradeoff is complexity: sorting costs O(n log n) rather than the average O(n) of a hash-based counting approach.
Sorting can still be a good choice when:
- the element type is easy to sort
- memory is tight and in-place sorting is acceptable
- code simplicity matters more than absolute performance
Why a Set Is Not Enough
A common mistake is to compare set(a) == set(b). That only checks unique members and ignores how many times each value appears.
Output:
That answer is wrong for multiset comparison because the duplicate counts are different.
A Manual Counting Version
If you want to see the algorithm without relying on Counter, here is a manual implementation:
This is the same underlying idea as Counter, just written out explicitly.
Special Cases to Think About
If order matters, the problem becomes much simpler: just compare the arrays directly. If duplicates do not matter, then set comparison is appropriate. If the arrays can contain unhashable values such as nested lists, the frequency-table approach needs adaptation because those elements cannot be used as dictionary keys directly.
So the best algorithm depends on the exact definition of equality. For the usual "same members including repeats" definition, counting wins.
Common Pitfalls
The first pitfall is skipping the length check. If the arrays have different lengths, they cannot have identical members with matching multiplicities.
Another mistake is using sets and accidentally throwing away duplicate information. That solves a different problem.
Sorting can also mislead people when the elements are expensive to compare or not naturally sortable. A counting approach may still work fine there if the values are hashable.
Finally, be careful with mixed data types. In some languages or datasets, values that look similar may compare or hash differently, so make sure the equality semantics match what the application expects.
Summary
- For multiset comparison, use a frequency table or
Counter. - This approach is usually
O(n)on average and handles duplicates correctly. - Sorting both arrays also works but is typically
O(n log n). - Set comparison is wrong when duplicate counts matter.
- Define clearly whether order and multiplicity are part of "identical members" before choosing the algorithm.

