array comparison
element detection
added elements
removed elements
algorithm development

Algorithm to find added/removed elements in an array

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding added and removed elements between two arrays is a small diff problem. The right algorithm depends on whether order matters and whether duplicate values matter, because a simple set difference is fast but loses multiplicity information immediately.

When a Set Difference Is Enough

If the arrays represent unique values and you only care whether an element exists or not, sets are the simplest solution.

python
1old = [1, 2, 3, 4]
2new = [2, 3, 5, 6]
3
4removed = set(old) - set(new)
5added = set(new) - set(old)
6
7print("removed:", removed)
8print("added:", added)

This runs in roughly linear time and is easy to read. It is a good answer for tags, feature flags, or any collection where duplicates are not meaningful.

Why Sets Can Be Wrong

Suppose the original array is:

python
old = [1, 1, 2]
new = [1, 2, 2]

A set says both are {1, 2}, so nothing changed. But that is false if duplicates matter: one 1 was removed and one 2 was added.

As soon as multiplicity matters, treat the problem as a frequency-diff problem rather than as a pure set-diff problem.

Count-Based Diff with Counter

Python's collections.Counter is a clean way to compare arrays while preserving duplicate counts.

python
1from collections import Counter
2
3old = [1, 1, 2, 4]
4new = [1, 2, 2, 5]
5
6old_counts = Counter(old)
7new_counts = Counter(new)
8
9removed = list((old_counts - new_counts).elements())
10added = list((new_counts - old_counts).elements())
11
12print("removed:", removed)  # [1, 4]
13print("added:", added)      # [2, 5]

This is usually the best general-purpose answer when arrays may contain repeated elements.

Preserving Order of Changes

If you also want to preserve the order in which removed or added items appear, use a count map manually:

python
1from collections import Counter
2
3
4def diff_arrays(old, new):
5    new_remaining = Counter(new)
6    removed = []
7
8    for item in old:
9        if new_remaining[item] > 0:
10            new_remaining[item] -= 1
11        else:
12            removed.append(item)
13
14    old_remaining = Counter(old)
15    added = []
16
17    for item in new:
18        if old_remaining[item] > 0:
19            old_remaining[item] -= 1
20        else:
21            added.append(item)
22
23    return added, removed
24
25
26added, removed = diff_arrays([1, 1, 2, 4], [1, 2, 2, 5])
27print("added:", added)      # [2, 5]
28print("removed:", removed)  # [1, 4]

This version is still efficient, but it keeps the output in the encounter order of each input list.

Sorting and Two Pointers

Another option is to sort both arrays and walk them with two pointers. That works well in languages without a convenient hash map, or when the data is already sorted.

python
1def sorted_diff(old, new):
2    old = sorted(old)
3    new = sorted(new)
4    i = j = 0
5    added = []
6    removed = []
7
8    while i < len(old) and j < len(new):
9        if old[i] == new[j]:
10            i += 1
11            j += 1
12        elif old[i] < new[j]:
13            removed.append(old[i])
14            i += 1
15        else:
16            added.append(new[j])
17            j += 1
18
19    removed.extend(old[i:])
20    added.extend(new[j:])
21    return added, removed

The tradeoff is the sort cost, which makes it O(n log n + m log m) instead of near-linear hashing.

Choosing the Right Algorithm

Use these rules:

  • use sets when duplicates do not matter
  • use frequency maps when duplicates do matter
  • use sorting when ordered traversal of sorted data is natural or hashing is not ideal

There is no single best algorithm without clarifying the data model first.

Common Pitfalls

The biggest mistake is reaching for sets by reflex. They are elegant, but they silently discard duplicate counts, which can turn a correct diff into a misleading one.

Another issue is forgetting to define whether order matters. Some applications want mathematical additions and removals only, while others want the first removed item and the first added item in a predictable sequence.

Finally, do not overcomplicate tiny datasets. For five-element arrays in a one-off script, readability often matters more than asymptotic perfection. The optimization discussion becomes important when arrays are large or the operation happens repeatedly.

Summary

  • Set difference is fine only when duplicates do not matter.
  • Use Counter or another frequency map when repeated values are significant.
  • A manual count-based pass can preserve the order of additions and removals.
  • Sorting with two pointers is a valid alternative when sorted traversal is convenient.
  • Define the problem clearly before choosing the algorithm, especially around duplicates and order.

Course illustration
Course illustration

All Rights Reserved.