python
algorithm
set intersection
programming
python sets

What's the algorithm of 'set.intersection' in python?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

At a high level, Python set intersection works by iterating over one set and checking membership in the others using hash-table lookups. The important optimization is that Python generally prefers to iterate over the smaller set first, because membership checks in a hash table are usually O(1) on average.

The Core Idea

For two sets A and B, the intersection contains elements that are present in both.

A practical algorithm is:

  1. choose the smaller set to iterate over
  2. for each element in that set, test membership in the other set
  3. add matching elements to the result set

That is much better than nested scanning because hash-table membership is fast on average.

A Simplified Python Version

python
1
2def simple_intersection(a, b):
3    if len(a) > len(b):
4        a, b = b, a
5
6    result = set()
7    for item in a:
8        if item in b:
9            result.add(item)
10    return result
11
12
13print(simple_intersection({1, 2, 3}, {2, 3, 4}))

This captures the essential idea behind efficient set intersection.

Why Iterating the Smaller Set Helps

If A has 10 elements and B has 10 million, iterating A and checking membership in B is dramatically cheaper than iterating B and checking membership in A.

That is why size-based choice matters.

For two sets, the average-case time is roughly proportional to the size of the smaller set, assuming hash lookups behave normally.

More Than Two Sets

For multiple sets, the idea generalizes:

  • start with the smallest set or a small intermediate result
  • repeatedly filter against the remaining sets

The exact implementation details are interpreter-specific, but the principle is still about minimizing the number of membership checks by starting from the smallest candidate pool.

Hash Table Behavior Matters

Python sets are hash tables. Average membership testing is O(1), which is why the intersection operation is efficient in practice.

In pathological cases with terrible hash behavior, performance can degrade, but for ordinary well-behaved hashable objects, the average-case model is what matters.

intersection Versus intersection_update

set.intersection(...) creates a new set.

python
a = {1, 2, 3}
b = {2, 3, 4}
print(a.intersection(b))

set.intersection_update(...) modifies the original set in place.

python
1a = {1, 2, 3}
2b = {2, 3, 4}
3a.intersection_update(b)
4print(a)

The algorithmic idea is similar, but the memory behavior differs because one creates a new object and the other mutates an existing one.

Why Order Does Not Matter Semantically

Set intersection is commutative mathematically, so the result does not depend on the order of operands. But the implementation can still choose a different execution strategy based on sizes, because it is free to optimize how the same mathematical result is produced.

Common Pitfalls

A common mistake is thinking intersection compares every element against every other element with a nested loop. Python sets do not work that way in the average case because membership checks are hash-based.

Another mistake is assuming the exact internal algorithm is guaranteed identically across all Python implementations. The high-level strategy is stable, but fine details are implementation-specific.

Developers also sometimes forget that all elements must be hashable. If an item is unhashable, it cannot participate in a set at all.

Summary

  • Python set intersection usually iterates the smaller set and does membership checks in the larger one.
  • Hash-table membership is what makes it efficient.
  • For two sets, average work is roughly proportional to the smaller set size.
  • 'intersection returns a new set, while intersection_update mutates in place.'
  • The exact low-level implementation may vary, but the small-set-plus-membership-check strategy is the core idea.

Course illustration
Course illustration