C#
LINQ
Distinct method
programming issue
duplicate problem

Distinct not working with LINQ to Objects

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

When Distinct() in LINQ appears to keep duplicates, the method is usually behaving correctly. The problem is that custom reference types compare by reference unless you define value-based equality, so two objects with the same visible data are still treated as different if they are different instances.

Core Sections

Why Distinct() keeps objects that look identical

LINQ Distinct() relies on equality and hashing. For reference types, the default behavior comes from object identity, not from matching property values. That means this code returns two results, not one:

csharp
1using System;
2using System.Collections.Generic;
3using System.Linq;
4
5class Person
6{
7    public string Email { get; set; } = "";
8}
9
10var people = new List<Person>
11{
12    new Person { Email = "[email protected]" },
13    new Person { Email = "[email protected]" }
14};
15
16Console.WriteLine(people.Distinct().Count());

To a human, the objects look duplicated. To Distinct(), they are just two different references.

Use an IEqualityComparer when the uniqueness rule is query-specific

If the deduplication rule depends on one or more fields in a specific query, pass a comparer explicitly.

csharp
1using System;
2using System.Collections.Generic;
3using System.Linq;
4
5class Person
6{
7    public string Email { get; set; } = "";
8}
9
10class PersonEmailComparer : IEqualityComparer<Person>
11{
12    public bool Equals(Person? x, Person? y)
13        => StringComparer.OrdinalIgnoreCase.Equals(x?.Email, y?.Email);
14
15    public int GetHashCode(Person obj)
16        => StringComparer.OrdinalIgnoreCase.GetHashCode(obj.Email ?? "");
17}
18
19var people = new List<Person>
20{
21    new Person { Email = "[email protected]" },
22    new Person { Email = "[email protected]" }
23};
24
25var unique = people.Distinct(new PersonEmailComparer()).ToList();
26Console.WriteLine(unique.Count);

This is a good fit when the type itself does not have one universal identity rule, or when you want the deduplication logic to stay local to a specific use case.

Put equality on the type when it is intrinsic

If value-based identity is part of what the type means everywhere, put the equality logic on the type itself. Modern C# records are especially convenient here.

csharp
1using System;
2using System.Linq;
3
4public record User(string Email);
5
6var users = new[]
7{
8    new User("[email protected]"),
9    new User("[email protected]")
10};
11
12Console.WriteLine(users.Distinct().Count());

Records generate value-based equality automatically, which makes them a strong default for immutable data objects.

Use DistinctBy() when only one key matters

In newer .NET versions, DistinctBy() can be simpler than writing a full comparer class for one-field deduplication.

csharp
1using System;
2using System.Collections.Generic;
3using System.Linq;
4
5class Person
6{
7    public string Email { get; set; } = "";
8}
9
10var people = new List<Person>
11{
12    new Person { Email = "[email protected]" },
13    new Person { Email = "[email protected]" }
14};
15
16var unique = people.DistinctBy(p => p.Email, StringComparer.OrdinalIgnoreCase).ToList();
17Console.WriteLine(unique.Count);

That keeps the intent obvious when a single key defines uniqueness for the query.

Watch out for mutable equality keys

Hash-based operations assume the key values do not change while the objects are participating in equality-based collections or queries. If you use mutable properties for equality, subtle bugs appear later.

For example, if email defines equality and the email can be mutated, a previously deduplicated object may no longer behave consistently in sets, dictionaries, or repeated queries. Immutable key fields or value objects make this much safer.

Common Pitfalls

  • Calling Distinct() on custom classes without defining value-based equality leaves the operation using reference equality.
  • Overriding Equals() without a matching GetHashCode() breaks hash-based behavior and causes inconsistent results.
  • Using mutable properties as equality keys creates unstable deduplication rules.
  • Applying different case-sensitivity rules in different parts of the codebase makes duplicate handling inconsistent.
  • Repeating ad hoc comparer logic across files instead of centralizing it causes drift over time.

Summary

  • 'Distinct() depends on equality semantics, not on whether two objects look the same to a reader.'
  • For reference types, default equality is usually by object identity.
  • Use IEqualityComparer, DistinctBy(), or value-based types such as records to define the real uniqueness rule.
  • Keep Equals() and GetHashCode() consistent.
  • Prefer stable, preferably immutable key fields when deduplicating custom objects.

Course illustration
Course illustration

All Rights Reserved.