Algorithm
k smallest numbers
array processing
data structures
computational efficiency

Algorithm to find k smallest numbers in array of n items

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding the k smallest values in an unsorted array is a classic selection problem. The best algorithm depends on what you need afterward: a sorted result, support for streaming input, or the best average-case speed when the full array does not need to be sorted.

Sort the Whole Array for the Simplest Solution

The easiest approach is to sort the entire array and take the first k values.

python
1def k_smallest_sort(values, k):
2    return sorted(values)[:k]
3
4print(k_smallest_sort([9, 4, 1, 7, 3, 6], 3))

This is straightforward and returns the answer already sorted. The cost is O(n log n), which is often acceptable for moderate arrays but does more work than necessary when k is small.

Use a Max-Heap When k Is Small

If k is much smaller than n, a heap is usually a better fit. Keep only the current k smallest values, and discard anything larger.

In Python, heapq is a min-heap, so one simple trick is to store negated values and treat it like a max-heap of size k.

python
1import heapq
2
3def k_smallest_heap(values, k):
4    if k <= 0:
5        return []
6
7    heap = []
8
9    for value in values:
10        if len(heap) < k:
11            heapq.heappush(heap, -value)
12        elif value < -heap[0]:
13            heapq.heapreplace(heap, -value)
14
15    return sorted(-x for x in heap)
16
17print(k_smallest_heap([9, 4, 1, 7, 3, 6], 3))

This runs in O(n log k) time and uses O(k) extra space. It is especially useful for streaming data because you never need to keep the whole input sorted.

Use Quickselect for Average Linear Time

If you only need the subset and care about average speed, Quickselect is a strong option. It partitions the array like Quicksort but recurses into only the side that contains the k smallest values.

python
1def quickselect(values, k):
2    if len(values) <= 1:
3        return values
4
5    pivot = values[len(values) // 2]
6    lows = [x for x in values if x < pivot]
7    highs = [x for x in values if x > pivot]
8    pivots = [x for x in values if x == pivot]
9
10    if k <= len(lows):
11        return quickselect(lows, k)
12    if k <= len(lows) + len(pivots):
13        return lows + pivots[:k - len(lows)]
14    return lows + pivots + quickselect(highs, k - len(lows) - len(pivots))
15
16result = quickselect([9, 4, 1, 7, 3, 6], 3)
17print(sorted(result))

Quickselect has average O(n) time, but its worst case can degrade to O(n²) if pivots are consistently bad.

Choose Based on the Real Requirement

A practical rule of thumb is:

  • use full sort when simplicity matters most
  • use a heap when k is small or input is streaming
  • use Quickselect when average-case speed matters and full sorting is unnecessary

If you need the final k values sorted after Quickselect, sort just those k values instead of sorting the whole array. That still tends to be cheaper than a complete sort when k is much smaller than n.

Watch the Edge Cases

Always define how the function behaves when:

  • 'k is 0'
  • 'k is greater than n'
  • the array contains duplicates

Good selection code is not only about asymptotic complexity. It is also about predictable behavior at the boundaries.

Common Pitfalls

  • Sorting the full array by habit when k is tiny and a heap would do less work.
  • Using a min-heap directly when what you need to track is the largest value among the current k smallest.
  • Forgetting that Quickselect does not automatically return a sorted result.
  • Ignoring worst-case behavior when deterministic performance matters.
  • Not handling k <= 0 or k > n explicitly.

Summary

  • Sorting is the simplest answer but costs O(n log n).
  • A max-heap of size k gives O(n log k) and works well for streaming data.
  • Quickselect gives average linear time when you only need the subset.
  • The right choice depends on output order, input size, and whether the data arrives all at once.
  • Good implementations should define edge-case behavior clearly, not only the happy path.

Course illustration
Course illustration

All Rights Reserved.