Python
hash function
object hashing
programming
best practices

What's a correct and good way to implement __hash__?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In Python, a correct __hash__ implementation is not about inventing a clever formula. It is about preserving the contract between hashing and equality. If two objects compare equal, they must produce the same hash value, and the attributes used for that hash must not change while the object is being used as a dictionary key or set member.

Start with the Equality Rule

The core rule is simple:

  • if a == b, then hash(a) == hash(b) must also be true

The reverse is not required. Different objects can share the same hash. Collisions are normal. What matters is consistency with __eq__.

A good implementation usually hashes the same immutable fields that __eq__ uses.

python
1class Point:
2    def __init__(self, x: int, y: int):
3        self.x = x
4        self.y = y
5
6    def __eq__(self, other):
7        if not isinstance(other, Point):
8            return NotImplemented
9        return (self.x, self.y) == (other.x, other.y)
10
11    def __hash__(self):
12        return hash((self.x, self.y))

This is the normal pattern. Hash the tuple of equality-defining fields and let Python's built-in tuple hashing do the work.

Why Mutable Fields Are Dangerous

If an object's hash changes after it has been inserted into a set or used as a dictionary key, lookups can break in subtle ways.

Bad idea:

python
1class BadUser:
2    def __init__(self, email: str):
3        self.email = email
4
5    def __eq__(self, other):
6        return isinstance(other, BadUser) and self.email == other.email
7
8    def __hash__(self):
9        return hash(self.email)

If email changes after the object is already in a set, the set still stores it based on the old hash bucket. That leads to confusing behavior.

So the practical rule is: only implement __hash__ when the fields involved are effectively immutable for the lifetime of the hashed object.

Use dataclass Support When It Fits

For value objects, dataclasses often give you the cleanest solution.

python
1from dataclasses import dataclass
2
3@dataclass(frozen=True)
4class Point:
5    x: int
6    y: int

With frozen=True, the instance is immutable and hashable in a way that matches the generated equality behavior. That is often better than writing both methods manually.

If the class is mutable and still uses generated equality, Python deliberately avoids making it hashable by default. That is a good guardrail, not an inconvenience.

When __hash__ = None Is Correct

Sometimes the right implementation is no implementation at all. If your object has value-based equality but mutable state, it should usually be unhashable.

python
1class MutableOrder:
2    def __init__(self, order_id: str, status: str):
3        self.order_id = order_id
4        self.status = status
5
6    def __eq__(self, other):
7        if not isinstance(other, MutableOrder):
8            return NotImplemented
9        return (self.order_id, self.status) == (other.order_id, other.status)
10
11    __hash__ = None

This prevents accidental use as a dictionary key and makes the class behavior honest.

Do Not Hand-Roll Bit Mixing Without a Reason

A lot of bad examples build custom formulas with primes, XOR, or shifting because the author assumes hashing must be manually optimized. In Python, that is usually unnecessary. hash() on tuples already handles multiple fields well and keeps the intent obvious.

Good:

python
def __hash__(self):
    return hash((self.x, self.y, self.label))

Usually unnecessary:

python
def __hash__(self):
    return hash(self.x) * 31 ^ hash(self.y) * 17 ^ hash(self.label)

The second version is harder to read and rarely buys anything meaningful.

Identity Hashing Versus Value Hashing

If you do not override __eq__, Python's default object behavior uses identity semantics. In that case, the default hash is also identity-based, which is fine for objects whose meaning is their identity.

The problem begins when you override __eq__ for value semantics but forget that hashing must change too. In Python 3, defining __eq__ without a compatible __hash__ usually makes the class unhashable, which is safer than silently getting it wrong.

Common Pitfalls

  • Hashing fields that can change after the object is inserted into a set or dictionary.
  • Making __eq__ and __hash__ depend on different attributes.
  • Writing a custom arithmetic hash formula when hash((field1, field2)) is already correct and clear.
  • Forcing hashability onto mutable value objects that should really be unhashable.
  • Forgetting that @dataclass(frozen=True) often solves the problem cleanly.

Summary

  • A correct __hash__ must be consistent with __eq__.
  • Hash the same immutable fields that define equality.
  • For most value objects, hash((field1, field2, ...)) is the right implementation.
  • Mutable value objects should usually set __hash__ = None.
  • Prefer built-in Python patterns over hand-written hash formulas.

Course illustration
Course illustration

All Rights Reserved.