Algorithm to find added/removed elements in an array

array comparison

element detection

added elements

removed elements

algorithm development

Algorithm to find added/removed elements in an array

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Finding added and removed elements between two arrays is a small diff problem. The right algorithm depends on whether order matters and whether duplicate values matter, because a simple set difference is fast but loses multiplicity information immediately.

When a Set Difference Is Enough

If the arrays represent unique values and you only care whether an element exists or not, sets are the simplest solution.

python

1old = [1, 2, 3, 4]
2new = [2, 3, 5, 6]
3
4removed = set(old) - set(new)
5added = set(new) - set(old)
6
7print("removed:", removed)
8print("added:", added)

This runs in roughly linear time and is easy to read. It is a good answer for tags, feature flags, or any collection where duplicates are not meaningful.

Why Sets Can Be Wrong

Suppose the original array is:

python

old = [1, 1, 2]
new = [1, 2, 2]

A set says both are {1, 2}, so nothing changed. But that is false if duplicates matter: one 1 was removed and one 2 was added.

As soon as multiplicity matters, treat the problem as a frequency-diff problem rather than as a pure set-diff problem.

Count-Based Diff with `Counter`

Python's collections.Counter is a clean way to compare arrays while preserving duplicate counts.

python

1from collections import Counter
2
3old = [1, 1, 2, 4]
4new = [1, 2, 2, 5]
5
6old_counts = Counter(old)
7new_counts = Counter(new)
8
9removed = list((old_counts - new_counts).elements())
10added = list((new_counts - old_counts).elements())
11
12print("removed:", removed)  # [1, 4]
13print("added:", added)      # [2, 5]

This is usually the best general-purpose answer when arrays may contain repeated elements.

Preserving Order of Changes

If you also want to preserve the order in which removed or added items appear, use a count map manually:

python

1from collections import Counter
2
3
4def diff_arrays(old, new):
5    new_remaining = Counter(new)
6    removed = []
7
8    for item in old:
9        if new_remaining[item] > 0:
10            new_remaining[item] -= 1
11        else:
12            removed.append(item)
13
14    old_remaining = Counter(old)
15    added = []
16
17    for item in new:
18        if old_remaining[item] > 0:
19            old_remaining[item] -= 1
20        else:
21            added.append(item)
22
23    return added, removed
24
25
26added, removed = diff_arrays([1, 1, 2, 4], [1, 2, 2, 5])
27print("added:", added)      # [2, 5]
28print("removed:", removed)  # [1, 4]

This version is still efficient, but it keeps the output in the encounter order of each input list.

Sorting and Two Pointers

Another option is to sort both arrays and walk them with two pointers. That works well in languages without a convenient hash map, or when the data is already sorted.

python

1def sorted_diff(old, new):
2    old = sorted(old)
3    new = sorted(new)
4    i = j = 0
5    added = []
6    removed = []
7
8    while i < len(old) and j < len(new):
9        if old[i] == new[j]:
10            i += 1
11            j += 1
12        elif old[i] < new[j]:
13            removed.append(old[i])
14            i += 1
15        else:
16            added.append(new[j])
17            j += 1
18
19    removed.extend(old[i:])
20    added.extend(new[j:])
21    return added, removed

The tradeoff is the sort cost, which makes it O(n log n + m log m) instead of near-linear hashing.

Choosing the Right Algorithm

Use these rules:

use sets when duplicates do not matter
use frequency maps when duplicates do matter
use sorting when ordered traversal of sorted data is natural or hashing is not ideal

There is no single best algorithm without clarifying the data model first.

Common Pitfalls

The biggest mistake is reaching for sets by reflex. They are elegant, but they silently discard duplicate counts, which can turn a correct diff into a misleading one.

Another issue is forgetting to define whether order matters. Some applications want mathematical additions and removals only, while others want the first removed item and the first added item in a predictable sequence.

Finally, do not overcomplicate tiny datasets. For five-element arrays in a one-off script, readability often matters more than asymptotic perfection. The optimization discussion becomes important when arrays are large or the operation happens repeatedly.

Summary

Set difference is fine only when duplicates do not matter.
Use Counter or another frequency map when repeated values are significant.
A manual count-based pass can preserve the order of additions and removals.
Sorting with two pointers is a valid alternative when sorted traversal is convenient.
Define the problem clearly before choosing the algorithm, especially around duplicates and order.

Algorithm to find added/removed elements in an array

Master System Design with Codemia

Introduction

When a Set Difference Is Enough

Why Sets Can Be Wrong

Count-Based Diff with Counter

Preserving Order of Changes

Sorting and Two Pointers

Choosing the Right Algorithm

Common Pitfalls

Summary

Count-Based Diff with `Counter`