What's a correct and good way to implement __hash__?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In Python, a correct __hash__ implementation is not about inventing a clever formula. It is about preserving the contract between hashing and equality. If two objects compare equal, they must produce the same hash value, and the attributes used for that hash must not change while the object is being used as a dictionary key or set member.
Start with the Equality Rule
The core rule is simple:
- if
a == b, thenhash(a) == hash(b)must also be true
The reverse is not required. Different objects can share the same hash. Collisions are normal. What matters is consistency with __eq__.
A good implementation usually hashes the same immutable fields that __eq__ uses.
This is the normal pattern. Hash the tuple of equality-defining fields and let Python's built-in tuple hashing do the work.
Why Mutable Fields Are Dangerous
If an object's hash changes after it has been inserted into a set or used as a dictionary key, lookups can break in subtle ways.
Bad idea:
If email changes after the object is already in a set, the set still stores it based on the old hash bucket. That leads to confusing behavior.
So the practical rule is: only implement __hash__ when the fields involved are effectively immutable for the lifetime of the hashed object.
Use dataclass Support When It Fits
For value objects, dataclasses often give you the cleanest solution.
With frozen=True, the instance is immutable and hashable in a way that matches the generated equality behavior. That is often better than writing both methods manually.
If the class is mutable and still uses generated equality, Python deliberately avoids making it hashable by default. That is a good guardrail, not an inconvenience.
When __hash__ = None Is Correct
Sometimes the right implementation is no implementation at all. If your object has value-based equality but mutable state, it should usually be unhashable.
This prevents accidental use as a dictionary key and makes the class behavior honest.
Do Not Hand-Roll Bit Mixing Without a Reason
A lot of bad examples build custom formulas with primes, XOR, or shifting because the author assumes hashing must be manually optimized. In Python, that is usually unnecessary. hash() on tuples already handles multiple fields well and keeps the intent obvious.
Good:
Usually unnecessary:
The second version is harder to read and rarely buys anything meaningful.
Identity Hashing Versus Value Hashing
If you do not override __eq__, Python's default object behavior uses identity semantics. In that case, the default hash is also identity-based, which is fine for objects whose meaning is their identity.
The problem begins when you override __eq__ for value semantics but forget that hashing must change too. In Python 3, defining __eq__ without a compatible __hash__ usually makes the class unhashable, which is safer than silently getting it wrong.
Common Pitfalls
- Hashing fields that can change after the object is inserted into a set or dictionary.
- Making
__eq__and__hash__depend on different attributes. - Writing a custom arithmetic hash formula when
hash((field1, field2))is already correct and clear. - Forcing hashability onto mutable value objects that should really be unhashable.
- Forgetting that
@dataclass(frozen=True)often solves the problem cleanly.
Summary
- A correct
__hash__must be consistent with__eq__. - Hash the same immutable fields that define equality.
- For most value objects,
hash((field1, field2, ...))is the right implementation. - Mutable value objects should usually set
__hash__ = None. - Prefer built-in Python patterns over hand-written hash formulas.

