Optimization
Non-dominated Sorting
Multi-objective Algorithms
Computational Geometry
Pareto Efficiency

Algorithm to find non-dominated pairs

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding non-dominated pairs means finding the points that are not strictly worse than some other point on every objective. This shows up in Pareto optimization, skyline queries, and ranking systems where you care about trade-offs instead of a single score.

For a two-dimensional minimization problem, a pair p = (x, y) is dominated by q = (a, b) when a <= x and b <= y, with at least one strict improvement. The goal is to return every pair that survives that test.

What Dominance Means

Suppose each pair represents cost and latency, and smaller values are better. If one option is cheaper and no slower, then the slower, more expensive option should never appear in the final answer.

For example, in this list:

text
(4, 7), (2, 9), (3, 5), (5, 4), (2, 6)

the pair (4, 7) is dominated by (3, 5) because 3 < 4 and 5 < 7. The pair (2, 9) is not dominated by (3, 5) because it is cheaper, even though it is slower. Non-dominated points form the Pareto frontier.

Naive Algorithm

The simplest algorithm compares every point against every other point.

python
1def non_dominated_naive(points):
2    result = []
3
4    for i, p in enumerate(points):
5        dominated = False
6
7        for j, q in enumerate(points):
8            if i == j:
9                continue
10
11            if ((q[0] <= p[0] and q[1] <= p[1]) and
12                (q[0] < p[0] or q[1] < p[1])):
13                dominated = True
14                break
15
16        if not dominated:
17            result.append(p)
18
19    return result
20
21
22points = [(4, 7), (2, 9), (3, 5), (5, 4), (2, 6)]
23print(non_dominated_naive(points))

This is easy to understand and correct for small inputs, but it costs O(n^2) comparisons. That becomes expensive fast if you are processing thousands of points.

Faster Algorithm for Two Dimensions

In two dimensions, there is a standard optimization. Sort by the first coordinate, then sweep from best to worst while tracking the smallest second coordinate seen so far.

For minimization:

  1. Sort by x ascending.
  2. If two points have the same x, sort by y ascending.
  3. Walk left to right.
  4. Keep a running minimum of y.
  5. A point is non-dominated if its y is smaller than every y seen before with smaller or equal x.
python
1def non_dominated_2d(points):
2    ordered = sorted(points, key=lambda p: (p[0], p[1]))
3    frontier = []
4    best_y = float("inf")
5
6    for x, y in ordered:
7        if y < best_y:
8            frontier.append((x, y))
9            best_y = y
10
11    return frontier
12
13
14points = [(4, 7), (2, 9), (3, 5), (5, 4), (2, 6)]
15print(non_dominated_2d(points))

This runs in O(n log n) because the sort dominates the cost. The scan itself is linear.

The result for the example is:

text
[(2, 6), (3, 5), (5, 4)]

Each of those points improves one objective without being worse on the other than some earlier survivor.

Why the Sweep Works

After sorting by x, every later point has x greater than or equal to the current point. That means the only question left is whether an earlier point already had a y that was less than or equal to the current y. If yes, the current point is dominated. If no, it starts a new step on the Pareto frontier.

This logic is specific to the two-dimensional case. Once you move to three or more dimensions, a single running minimum is no longer enough. Then you need data structures such as range trees, divide-and-conquer skyline algorithms, or full non-dominated sorting.

Handling Duplicates and Ties

Ties matter. If the same pair appears twice, you may want to:

  • keep both, if the input is a multiset and duplicates are meaningful
  • keep one, if you only care about unique coordinate values

If two points share the same x, you should sort by y ascending so the better candidate comes first. Otherwise your sweep can keep the wrong representative and misclassify the rest.

Here is a version that returns unique frontier points:

python
1def non_dominated_unique(points):
2    ordered = sorted(set(points), key=lambda p: (p[0], p[1]))
3    frontier = []
4    best_y = float("inf")
5
6    for point in ordered:
7        if point[1] < best_y:
8            frontier.append(point)
9            best_y = point[1]
10
11    return frontier

Common Pitfalls

The most common mistake is forgetting whether the problem is a minimization or maximization problem. If larger values are better, the comparison signs must flip.

Another mistake is using a strict comparison in both coordinates. Dominance requires "better or equal" on all objectives and "strictly better" on at least one. If you require both coordinates to be strictly better, some dominated points will incorrectly survive.

Tied first coordinates are another source of bugs. If you sort only by the first coordinate and leave equal values in arbitrary order, the sweep can depend on input order. Sort by both coordinates explicitly.

Finally, do not overuse the O(n^2) version. It is fine for debugging and unit tests, but production code should use the O(n log n) sweep for two dimensions when possible.

Summary

  • A non-dominated pair is not worse than another pair on every objective.
  • The naive algorithm compares every point with every other point and costs O(n^2).
  • In two dimensions, sorting plus a sweep gives an O(n log n) solution.
  • Correct dominance checks use "better or equal" in all coordinates and "strictly better" in at least one.
  • Ties and duplicate points need an explicit policy so the frontier is stable and predictable.

Course illustration
Course illustration

All Rights Reserved.