How is set implemented?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
The `set()` data structure in Python is a fundamental collection type, known for its ability to store unique elements much like the mathematical concept of a set. Understanding how `set()` is implemented can illuminate its characteristics and performance profile. Let's explore its implementation, focusing on its properties, usage, and underlying mechanics.
Characteristics of `set()`
Unique Elements
A `set` ensures that each element is unique, eliminating duplicates automatically. When an element is added, the `set` checks for its presence and only stores it if it isn't already included.
Unordered Collection
Sets are unordered, meaning there is no guaranteed order of elements. This is distinct from a `list` or `tuple`, where order is preserved. The lack of order is a byproduct of the underlying data structure, which we'll explore later.
Mutable and Immutable Variants
Python provides two types of sets:
- `set`: Mutable, allowing elements to be added or removed.
- `frozenset`: Immutable, preventing modification after creation.
Implementation Details
`Hash` Table Usage
At the core of the `set` implementation is a hash table, which provides efficient operations for adding, removing, and checking for the presence of elements.
Why `Hash` Tables?
- Fast Lookup: `Hash` tables offer average time complexity of for lookup operations.
- Efficiency in Modification: Insertion and deletion also average time complexity.
`Hash` Function
Each element in a `set` must be hashable, meaning it must support a method called `hash()`. This hash function generates a hash value, an integer that acts as a unique key for each element.
- Open Addressing: Finding the next available slot in the hash table.
- Chaining: Storing a list of elements in the same slot.
- Union: `A | B` combines elements from both sets.
- Intersection: `A & B` finds elements present in both sets.
- Difference: `A - B` gets elements in A not in B.
- Symmetric Difference: `A ^ B` finds elements in either A or B, but not both.
- Removing duplicates.
- Membership checks.
- Calculating unions and intersections efficiently.
- Elements must be hashable, which generally excludes mutable types like lists or dictionaries.
- No order preservation, so if order is vital to use case, a set may not be suitable.

