random numbers
algorithm
number generation
programming
distinct values

Generating m distinct random numbers in the range 0..n-1

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Generating m distinct random numbers from the range 0 through n - 1 is a sampling problem without replacement. The right algorithm depends on the relative sizes of m and n, because a method that is fine for a tiny sample can become wasteful when the sample is large.

The Simple and Correct Built-In Option

In Python, the easiest correct solution is random.sample. It already performs sampling without replacement.

python
1import random
2
3
4def pick_numbers(n: int, m: int) -> list[int]:
5    if not 0 <= m <= n:
6        raise ValueError("m must be between 0 and n")
7    return random.sample(range(n), m)
8
9
10print(pick_numbers(10, 4))

If you only need working code and the language provides a sampling primitive, use it. The interesting part starts when you need to understand the underlying algorithms or implement one manually.

Why Repeated Random Picks Can Be Slow

A common first attempt is:

  1. draw a random number in the range
  2. store it in a set if it is new
  3. keep going until the set has m values

That works, but duplicates become more frequent as the set fills up. When m is close to n, the algorithm spends more and more time redrawing values it already has.

python
1import random
2
3
4def pick_with_set(n: int, m: int) -> list[int]:
5    if not 0 <= m <= n:
6        raise ValueError("m must be between 0 and n")
7
8    seen = set()
9    while len(seen) < m:
10        seen.add(random.randrange(n))
11    return list(seen)

This is acceptable for small samples, but it is not the most predictable solution.

Partial Fisher-Yates Shuffle

When m is large, a partial Fisher-Yates shuffle is a strong choice. Start with the whole range, then swap random elements into the front m positions.

python
1import random
2
3
4def pick_with_partial_shuffle(n: int, m: int) -> list[int]:
5    if not 0 <= m <= n:
6        raise ValueError("m must be between 0 and n")
7
8    values = list(range(n))
9    for i in range(m):
10        j = random.randrange(i, n)
11        values[i], values[j] = values[j], values[i]
12    return values[:m]
13
14
15print(pick_with_partial_shuffle(10, 4))

This runs in linear time with respect to the number of elements touched. It is simple, unbiased, and efficient when you can afford an array of size n.

Floyd's Algorithm for Smaller Samples

If n is huge but m is relatively small, storing the full range can waste memory. Floyd's algorithm generates a sample without replacement using only a set of selected values.

python
1import random
2
3
4def pick_with_floyd(n: int, m: int) -> list[int]:
5    if not 0 <= m <= n:
6        raise ValueError("m must be between 0 and n")
7
8    chosen = set()
9    for j in range(n - m, n):
10        t = random.randrange(j + 1)
11        if t in chosen:
12            chosen.add(j)
13        else:
14            chosen.add(t)
15    return list(chosen)
16
17
18print(sorted(pick_with_floyd(1_000_000, 5)))

The key advantage is memory: it stores only the selected values, not the entire range. That makes it attractive for very large domains.

Which Method Should You Use

Choose based on constraints:

  • use random.sample when available and you do not need custom behavior
  • use the set-based loop for quick prototypes or very small samples
  • use partial Fisher-Yates when m is a significant fraction of n
  • use Floyd's algorithm when n is very large and m is much smaller

There is no single best algorithm for every case. The best one is the one that matches your size and memory profile.

Common Pitfalls

Using repeated random draws without checking uniqueness does not solve the problem because duplicates will appear.

The set-based retry loop can become slow when m approaches n, because collisions become common.

Allocating an array of length n for a partial shuffle is expensive when n is huge and the sample is tiny.

Poor bounds checking is another common bug. m must not be negative, and it must not exceed n.

If output order matters, remember that sets do not preserve a guaranteed random ordering semantics for your algorithm. Sort or shuffle the result explicitly if you need a specific presentation.

Summary

  • The problem is sampling without replacement from 0 through n - 1.
  • 'random.sample is the simplest correct solution in Python.'
  • Repeated draws with a set work, but performance degrades as the sample fills.
  • Partial Fisher-Yates is efficient when you can materialize the full range.
  • Floyd's algorithm is a strong option when n is large and m is small.

Course illustration
Course illustration