Algorithms
Data Structures
Sorting
Programming
Computer Science

Select top N elements of related objects

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Selecting the top N objects is easy when you only have plain numbers. It becomes more interesting when the objects are related and the ranking depends on a nested property, a parent-child association, or a derived score. The main design choice is whether you need the top N globally, per related group, or in a streaming context where sorting everything would be wasteful.

Define The Ranking Rule Explicitly

Before writing code, make the ranking criteria concrete. Suppose each Order belongs to a Customer, and we want the top three orders by total amount.

python
1from dataclasses import dataclass
2
3
4@dataclass
5class Customer:
6    id: int
7    name: str
8
9
10@dataclass
11class Order:
12    id: int
13    customer: Customer
14    total: float

The algorithm should not guess what "top" means. It should rank by an explicit key such as order.total, possibly with a tie-breaker like order.id.

Full Sorting Is The Clearest Baseline

If the dataset is modest and clarity matters more than absolute efficiency, sort the full list and take the first N.

python
1customers = [
2    Customer(1, "Ana"),
3    Customer(2, "Ben"),
4]
5
6orders = [
7    Order(101, customers[0], 120.0),
8    Order(102, customers[1], 310.0),
9    Order(103, customers[0], 200.0),
10    Order(104, customers[1], 180.0),
11]
12
13top_orders = sorted(
14    orders,
15    key=lambda order: (-order.total, order.id)
16)[:3]
17
18for order in top_orders:
19    print(order.id, order.customer.name, order.total)

This is usually the right first implementation because the intent is obvious and tie-breaking is easy to see.

Use A Heap When N Is Small

If the collection is very large and you only need a small top slice, a min-heap avoids sorting everything.

python
1import heapq
2
3
4def top_n_orders(orders, n):
5    heap = []
6
7    for order in orders:
8        item = (order.total, -order.id, order)
9        if len(heap) < n:
10            heapq.heappush(heap, item)
11        else:
12            heapq.heappushpop(heap, item)
13
14    return [item[2] for item in sorted(heap, reverse=True)]
15
16
17for order in top_n_orders(orders, 2):
18    print(order.id, order.total)

This keeps only N elements in memory instead of sorting the entire input. It is especially useful for streams or very large collections.

Sometimes the requirement is not top N overall, but top N children for each parent object. That changes the problem from one global selection to grouped ranking.

python
1from collections import defaultdict
2
3
4def top_n_per_customer(orders, n):
5    grouped = defaultdict(list)
6
7    for order in orders:
8        grouped[order.customer.id].append(order)
9
10    result = {}
11    for customer_id, customer_orders in grouped.items():
12        result[customer_id] = sorted(
13            customer_orders,
14            key=lambda order: (-order.total, order.id)
15        )[:n]
16
17    return result
18
19
20result = top_n_per_customer(orders, 1)
21for customer_id, best_orders in result.items():
22    print(customer_id, [order.id for order in best_orders])

This grouped version is common in recommendation systems, reporting, and ranking problems where each parent entity needs its own winners.

Tie-Breaking Prevents Surprises

If two related objects have the same score, the output can become unstable unless you add a secondary ordering rule. For reproducible results, sort by:

  • primary ranking value,
  • then a stable identifier,
  • or another deterministic field.

That matters in tests and in user-facing ranking outputs where order changes can look like bugs.

Complexity Tradeoffs

The best method depends on the data shape.

  • Full sorting costs more work overall but is simple.
  • A heap is efficient when N is small relative to the full collection.
  • Grouped selection can combine either strategy inside each group.

So the right question is not "what is the most advanced algorithm?" It is "how large is the input, and is the ranking global or grouped?"

Common Pitfalls

  • Failing to define the ranking key on the related objects clearly.
  • Sorting the entire collection when only a tiny top slice is needed from a huge input.
  • Forgetting tie-breakers and then getting unstable output order.
  • Solving the global top N problem when the requirement was really top N per parent group.
  • Ignoring that related-object access may itself be expensive if data is loaded lazily from a database or API.

Summary

  • Start by defining whether "top" is global or per related group.
  • Full sorting is the clearest baseline for moderate datasets.
  • A heap is better when N is small and the input is large or streaming.
  • Add deterministic tie-breakers so output stays stable.
  • Related-object ranking is mostly about choosing the right grouping and selection strategy.

Course illustration
Course illustration

All Rights Reserved.