Compute list difference
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Computing the difference between two lists sounds simple until duplicates become relevant. If one list contains repeated values, you have to decide whether you want mathematical set difference, order-preserving filtering, or multiset subtraction where each matching duplicate is removed only once.
Those are different problems, and they produce different answers. The correct approach depends on whether duplicate counts matter, whether original order matters, and how large the inputs are.
Use Set Difference When Uniqueness Is All You Need
If you only care about unique values, convert both lists to sets and subtract them.
This gives the distinct values from a that do not appear in b. It is concise and usually fast, but it throws away:
- original order
- duplicate counts
That makes it useful for membership-style problems, but not for cases where the list itself carries meaning through order or repetition.
Preserve Order While Removing All Matching Values
If you want to keep the order from the first list and remove every value that appears in the second list, use a lookup set and a comprehension.
This prints:
Every 2 is removed because 2 exists in b. That is often the intended meaning in application code, such as filtering out disabled feature flags or excluding blocked identifiers from an ordered result set.
Use Counter When Duplicate Counts Matter
Sometimes you want multiset subtraction. In other words, if b contains two copies of a value, only two copies should be removed from a.
Possible output:
One 2 remains because the first list had three copies and the second list had two. That is the key difference between duplicate-aware subtraction and simple filtering.
Preserve Both Order and Duplicate Counts
Counter handles counts well, but elements() does not preserve the left-to-right order from the original list. If you need both duplicate-aware subtraction and stable order, keep a removal counter and walk the first list manually.
This returns:
Now you get the behavior many people really need:
- remove only as many duplicates as the second list contains
- keep the surviving elements in their original order
Think About Complexity and Semantics Together
The naive pattern below works but becomes slow on larger lists because membership checks against a plain list are linear:
If order-preserving full filtering is the goal, converting b to a set usually gives a better time profile. If duplicate counts matter, Counter is cleaner and safer than repeatedly calling list.remove() inside a loop.
Performance matters, but semantics matter first. A fast solution that removes too many duplicates or scrambles order is still wrong.
Common Pitfalls
The most common mistake is using set subtraction when duplicates or order matter. Another is writing a list comprehension against a plain list for membership checks on large inputs, which is correct but slower than using a set lookup. Developers also confuse "remove every matching value" with "remove only one matching copy for each appearance in the second list," even though those are different operations.
It is also easy to forget that Counter solves counts, not stable order. If both requirements matter, you need one more pass through the original list.
Summary
- Use set subtraction when you only care about unique values.
- Use an order-preserving filter when every matching value should be removed.
- Use
Countersubtraction when duplicate counts matter. - Walk the first list with a removal counter if you need both duplicate-aware subtraction and original order.
- Choose the algorithm based on semantics first, then optimize performance.

