Java 8
Programming
Distinct Property
Coding
Software Development

Java 8 Distinct by property

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Java 8 introduced a number of powerful new features, and among them, the Streams API revolutionized how Java developers work with collections and data processing. One interesting challenge that often arises when using streams is the need for filtering a collection of objects by the uniqueness of a specific property, often referred to as "distinct by property".

Understanding Distinct Operation

In simple terms, the .distinct() method provided by the Stream interface allows removing duplicates from a stream. However, this method uses the equals() method for object comparison, which checks object equality based on all properties of an object. This can be limiting when you want to filter objects based on uniqueness of one particular property (like an ID or name).

Custom Distinct by Property

To achieve a distinct operation by a specific property in Java 8, you need to implement a custom approach. The main idea is to maintain a set of the properties that have been seen so far while filtering the stream.

Example Implementation

Consider a collection of Person objects where each Person has a name and age. If you want to filter this list to include only unique names, you would do something like this using a custom collector:

java
1import java.util.function.Function;
2import java.util.stream.Collectors;
3import java.util.Set;
4import java.util.stream.Stream;
5import java.util.Collection;
6
7public class Main {
8    public static void main(String[] args) {
9        List<Person> people = Arrays.asList(
10            new Person("Alice", 30),
11            new Person("Bob", 20),
12            new Person("Alice", 22),
13            new Person("Charlie", 40)
14        );
15
16        Collection<Person> uniqueByName = people.stream()
17            .filter(distinctByKey(Person::getName))
18            .collect(Collectors.toList());
19        
20        uniqueByName.forEach(System.out::println);
21    }
22
23    private static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
24        Set<Object> seen = ConcurrentHashMap.newKeySet();
25        return t -> seen.add(keyExtractor.apply(t));
26    }
27}

In this example, distinctByKey is a method that generates a predicate using a set to track seen keys. It's thread-safe because it uses ConcurrentHashMap.newKeySet(), which ensures that even in a parallel stream, this method will function correctly.

Technical Breakdown

  • Streams: Java 8 streams represent a sequence of elements supporting sequential and parallel aggregate operations.
  • Predicate: A functional interface that represents a condition (returns boolean). Used here to test whether an element should be included or not.
  • Function: A functional interface in Java that takes one argument and produces a result. Used here to extract a key property from an object.
  • ConcurrentHashMap.newKeySet(): A thread-safe set backed by a ConcurrentHashMap used for maintaining a set of seen keys.

Table: Key Concepts and Their Usage

ConceptUsageDescription
Streampeople.stream()Creates a stream from the people list.
distinctByKeyfilter(distinctByKey(Person::getName))Custom method to filter stream by the 'name' property.
FunctionPerson::getNameMethod reference that serves as a key extractor.
Predicateseen.addUsed in distinctByKey to determine if an element has been encountered before.
Collectors.toList()collect(Collectors.toList())Collects the final stream into a list after filtering.

Conclusion and Additional Points

Creating a custom method like distinctByKey provides flexibility not only in terms of which property to consider for distinctiveness but also how the uniqueness is determined (e.g., comparing lower case names). Moreover, this approach can be extended to more complex scenarios, such as filtering based on composite keys (multiple properties) or using different data structures for performance optimizations (like HashSet for non-concurrent scenarios).

Understanding and leveraging these Java 8 features not only streamlines the code but also opens up possibilities for more readable, maintainable, and efficient implementations in handling data collections.


Course illustration
Course illustration

All Rights Reserved.