How does Top-K sort algorithm work in MongoDB
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
In MongoDB, top-k sort is the useful optimization behind many sort plus limit queries. Instead of sorting every matching document and then throwing most of them away, the engine can conceptually keep only the best k candidates seen so far. That reduces memory usage and can make the query much cheaper.
What "Top-K" Means
Suppose you want the 10 most expensive books:
The naive strategy is:
- find every matching book
- sort the entire result set
- return the first 10
The top-k idea is different. While scanning candidates, MongoDB can keep a bounded working set of only the best 10 documents for the requested order. A worse candidate can be discarded immediately once the working set is full.
Conceptually, that is similar to maintaining a heap of size k.
The Best Case Is Still an Index
The strongest optimization is not top-k sorting at all. It is an index that already provides the requested order.
With that index, MongoDB can scan only the matching category range in sorted order and stop after 10 results. That is usually faster and more memory-efficient than any in-memory sort strategy.
So the ranking of solutions is:
- index satisfies filter and sort
- bounded sort with
limit - full sort of all matches
Where Top-K Helps Most
Top-k style behavior matters when:
- the query includes
sort - the query also includes
limit - no suitable index fully satisfies the sort order
In that situation, MongoDB may still avoid retaining every matching document in sorter memory. The engine can combine the sort and limit so only the top k candidates are kept.
This does not make the query free. MongoDB may still need to inspect many documents. The win is that sorter state can stay bounded rather than growing with the full match set.
skip Changes the Effective K
If the query also uses skip, the working set is effectively larger:
To answer this correctly, MongoDB needs the top 30 documents in sorted order before it can drop the first 20 and return the next 10. So the optimization is still useful, but the bounded set is now closer to skip + limit than to limit alone.
This is one reason large skip values can become expensive even when limit is small.
Aggregation Pipelines Follow the Same Idea
The same pattern appears in aggregation:
When the planner can coalesce the sort and limit stages internally, the pipeline behaves more like a bounded top-k selection than a full materialized sort.
That is one reason stage order matters. Putting $limit immediately after $sort gives the planner the best chance to use the optimization.
Use explain() Instead of Guessing
The actual plan depends on indexes, filter shape, document count, and server version. So do not assume. Check:
If the plan shows an index scan satisfying the order, that is ideal. If it shows a sort stage, then you care about whether the sort is bounded effectively and whether memory use is acceptable.
Common Pitfalls
- Assuming every
sortforces a full sort of all matching documents. - Ignoring the fact that a proper index is still much better than bounded sorting.
- Forgetting that large
skipvalues increase the effective top-k working set. - Using aggregation stage order that prevents the planner from combining
sortandlimitefficiently. - Talking about internal sort behavior without checking the real
explain()output.
Summary
- Top-k sort in MongoDB matters when a query sorts and also limits the result.
- The engine can conceptually keep only the best
kcandidates instead of sorting everything. - The best optimization is still an index that satisfies both the filter and the order.
- '
skipincreases the effective number of sorted candidates.' - Always verify the real plan with
explain()instead of assuming the optimization happened.

