Algorithm to find k smallest numbers in array of n items
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding the k smallest values in an unsorted array is a classic selection problem. The best algorithm depends on what you need afterward: a sorted result, support for streaming input, or the best average-case speed when the full array does not need to be sorted.
Sort the Whole Array for the Simplest Solution
The easiest approach is to sort the entire array and take the first k values.
This is straightforward and returns the answer already sorted. The cost is O(n log n), which is often acceptable for moderate arrays but does more work than necessary when k is small.
Use a Max-Heap When k Is Small
If k is much smaller than n, a heap is usually a better fit. Keep only the current k smallest values, and discard anything larger.
In Python, heapq is a min-heap, so one simple trick is to store negated values and treat it like a max-heap of size k.
This runs in O(n log k) time and uses O(k) extra space. It is especially useful for streaming data because you never need to keep the whole input sorted.
Use Quickselect for Average Linear Time
If you only need the subset and care about average speed, Quickselect is a strong option. It partitions the array like Quicksort but recurses into only the side that contains the k smallest values.
Quickselect has average O(n) time, but its worst case can degrade to O(n²) if pivots are consistently bad.
Choose Based on the Real Requirement
A practical rule of thumb is:
- use full sort when simplicity matters most
- use a heap when
kis small or input is streaming - use Quickselect when average-case speed matters and full sorting is unnecessary
If you need the final k values sorted after Quickselect, sort just those k values instead of sorting the whole array. That still tends to be cheaper than a complete sort when k is much smaller than n.
Watch the Edge Cases
Always define how the function behaves when:
- '
kis0' - '
kis greater thann' - the array contains duplicates
Good selection code is not only about asymptotic complexity. It is also about predictable behavior at the boundaries.
Common Pitfalls
- Sorting the full array by habit when
kis tiny and a heap would do less work. - Using a min-heap directly when what you need to track is the largest value among the current
ksmallest. - Forgetting that Quickselect does not automatically return a sorted result.
- Ignoring worst-case behavior when deterministic performance matters.
- Not handling
k <= 0ork > nexplicitly.
Summary
- Sorting is the simplest answer but costs
O(n log n). - A max-heap of size
kgivesO(n log k)and works well for streaming data. - Quickselect gives average linear time when you only need the subset.
- The right choice depends on output order, input size, and whether the data arrives all at once.
- Good implementations should define edge-case behavior clearly, not only the happy path.

