Distinct not working with LINQ to Objects
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When Distinct() in LINQ appears to keep duplicates, the method is usually behaving correctly. The problem is that custom reference types compare by reference unless you define value-based equality, so two objects with the same visible data are still treated as different if they are different instances.
Core Sections
Why Distinct() keeps objects that look identical
LINQ Distinct() relies on equality and hashing. For reference types, the default behavior comes from object identity, not from matching property values. That means this code returns two results, not one:
To a human, the objects look duplicated. To Distinct(), they are just two different references.
Use an IEqualityComparer when the uniqueness rule is query-specific
If the deduplication rule depends on one or more fields in a specific query, pass a comparer explicitly.
This is a good fit when the type itself does not have one universal identity rule, or when you want the deduplication logic to stay local to a specific use case.
Put equality on the type when it is intrinsic
If value-based identity is part of what the type means everywhere, put the equality logic on the type itself. Modern C# records are especially convenient here.
Records generate value-based equality automatically, which makes them a strong default for immutable data objects.
Use DistinctBy() when only one key matters
In newer .NET versions, DistinctBy() can be simpler than writing a full comparer class for one-field deduplication.
That keeps the intent obvious when a single key defines uniqueness for the query.
Watch out for mutable equality keys
Hash-based operations assume the key values do not change while the objects are participating in equality-based collections or queries. If you use mutable properties for equality, subtle bugs appear later.
For example, if email defines equality and the email can be mutated, a previously deduplicated object may no longer behave consistently in sets, dictionaries, or repeated queries. Immutable key fields or value objects make this much safer.
Common Pitfalls
- Calling
Distinct()on custom classes without defining value-based equality leaves the operation using reference equality. - Overriding
Equals()without a matchingGetHashCode()breaks hash-based behavior and causes inconsistent results. - Using mutable properties as equality keys creates unstable deduplication rules.
- Applying different case-sensitivity rules in different parts of the codebase makes duplicate handling inconsistent.
- Repeating ad hoc comparer logic across files instead of centralizing it causes drift over time.
Summary
- '
Distinct()depends on equality semantics, not on whether two objects look the same to a reader.' - For reference types, default equality is usually by object identity.
- Use
IEqualityComparer,DistinctBy(), or value-based types such as records to define the real uniqueness rule. - Keep
Equals()andGetHashCode()consistent. - Prefer stable, preferably immutable key fields when deduplicating custom objects.

