Algorithm to find added/removed elements in an array
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding added and removed elements between two arrays is a small diff problem. The right algorithm depends on whether order matters and whether duplicate values matter, because a simple set difference is fast but loses multiplicity information immediately.
When a Set Difference Is Enough
If the arrays represent unique values and you only care whether an element exists or not, sets are the simplest solution.
This runs in roughly linear time and is easy to read. It is a good answer for tags, feature flags, or any collection where duplicates are not meaningful.
Why Sets Can Be Wrong
Suppose the original array is:
A set says both are {1, 2}, so nothing changed. But that is false if duplicates matter: one 1 was removed and one 2 was added.
As soon as multiplicity matters, treat the problem as a frequency-diff problem rather than as a pure set-diff problem.
Count-Based Diff with Counter
Python's collections.Counter is a clean way to compare arrays while preserving duplicate counts.
This is usually the best general-purpose answer when arrays may contain repeated elements.
Preserving Order of Changes
If you also want to preserve the order in which removed or added items appear, use a count map manually:
This version is still efficient, but it keeps the output in the encounter order of each input list.
Sorting and Two Pointers
Another option is to sort both arrays and walk them with two pointers. That works well in languages without a convenient hash map, or when the data is already sorted.
The tradeoff is the sort cost, which makes it O(n log n + m log m) instead of near-linear hashing.
Choosing the Right Algorithm
Use these rules:
- use sets when duplicates do not matter
- use frequency maps when duplicates do matter
- use sorting when ordered traversal of sorted data is natural or hashing is not ideal
There is no single best algorithm without clarifying the data model first.
Common Pitfalls
The biggest mistake is reaching for sets by reflex. They are elegant, but they silently discard duplicate counts, which can turn a correct diff into a misleading one.
Another issue is forgetting to define whether order matters. Some applications want mathematical additions and removals only, while others want the first removed item and the first added item in a predictable sequence.
Finally, do not overcomplicate tiny datasets. For five-element arrays in a one-off script, readability often matters more than asymptotic perfection. The optimization discussion becomes important when arrays are large or the operation happens repeatedly.
Summary
- Set difference is fine only when duplicates do not matter.
- Use
Counteror another frequency map when repeated values are significant. - A manual count-based pass can preserve the order of additions and removals.
- Sorting with two pointers is a valid alternative when sorted traversal is convenient.
- Define the problem clearly before choosing the algorithm, especially around duplicates and order.

