Intersection and union of ArrayLists in Java
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When Java developers ask for the intersection or union of two ArrayList values, the real question is usually about semantics, not syntax. Do duplicates matter, should order be preserved, and are you doing mathematical set logic or list logic?
The right implementation depends on those answers. Java gives you quick collection methods such as retainAll and addAll, but you need to know exactly what behavior they produce before you use them.
Start by Defining the Result
There are two common interpretations:
- set-style operations, where each value appears at most once
- list-style operations, where duplicates may remain
For example, if list a is [1, 2, 2, 3] and list b is [2, 3, 4], then:
- a list-style intersection based on
amight be[2, 2, 3] - a set-style intersection would be
[2, 3] - a list-style union might be
[1, 2, 2, 3, 2, 3, 4] - a set-style union would be
[1, 2, 3, 4]
If you skip this decision, it is very easy to write code that is technically correct but wrong for the real requirement.
Intersection With retainAll
If you want the elements from the first list that also appear in the second list, retainAll is the most direct tool. Copy the list first so you do not mutate the original input.
This prints:
Notice what happened: duplicates from the first list were preserved because retainAll keeps every element from the target list that is also contained in the other collection.
Union With addAll
If by "union" you simply mean "put both lists together," then you want concatenation:
That is list-style union, not mathematical set union. Duplicates stay in the result.
Set-Style Union and Intersection
If uniqueness matters, use a set. LinkedHashSet is often better than HashSet because it preserves insertion order.
For a set-style intersection:
This gives unique values only.
Streams Can Be Useful, but They Do Not Change the Semantics
Streams are a fine option when you want the rules to be explicit in the pipeline:
Here, distinct() changes the result from list-style to set-style behavior. That is the important part. Streams are just another way to express the rule.
Performance Notes
If one side is large and you are doing repeated membership checks, convert one collection to a HashSet or LinkedHashSet first. That usually gives much faster lookups than repeatedly calling contains on a list.
But do not optimize by changing the data structure unless you are also okay with the semantic change. Converting to a set removes duplicates by definition.
Common Pitfalls
The most common mistake is calling concatenation a union even when duplicates should have been removed. If uniqueness matters, use a set-based approach.
Another common bug is forgetting that retainAll mutates the list it is called on. Always copy the input first if other code still depends on the original data.
Developers also get surprised when HashSet changes iteration order. If order matters, prefer LinkedHashSet.
Finally, choose the operation based on meaning, not on whichever collection method looks shortest.
Summary
- Decide first whether you need set semantics or list semantics.
- Use
retainAllon a copied list for a simple intersection. - Use
addAllfor list-style union that preserves duplicates. - Use
LinkedHashSetwhen you want unique results without losing insertion order. - Be explicit about mutation, duplicates, and order so the result matches the real requirement.

