Generating m distinct random numbers in the range 0..n-1
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Generating m distinct random numbers from the range 0 through n - 1 is a sampling problem without replacement. The right algorithm depends on the relative sizes of m and n, because a method that is fine for a tiny sample can become wasteful when the sample is large.
The Simple and Correct Built-In Option
In Python, the easiest correct solution is random.sample. It already performs sampling without replacement.
If you only need working code and the language provides a sampling primitive, use it. The interesting part starts when you need to understand the underlying algorithms or implement one manually.
Why Repeated Random Picks Can Be Slow
A common first attempt is:
- draw a random number in the range
- store it in a set if it is new
- keep going until the set has
mvalues
That works, but duplicates become more frequent as the set fills up. When m is close to n, the algorithm spends more and more time redrawing values it already has.
This is acceptable for small samples, but it is not the most predictable solution.
Partial Fisher-Yates Shuffle
When m is large, a partial Fisher-Yates shuffle is a strong choice. Start with the whole range, then swap random elements into the front m positions.
This runs in linear time with respect to the number of elements touched. It is simple, unbiased, and efficient when you can afford an array of size n.
Floyd's Algorithm for Smaller Samples
If n is huge but m is relatively small, storing the full range can waste memory. Floyd's algorithm generates a sample without replacement using only a set of selected values.
The key advantage is memory: it stores only the selected values, not the entire range. That makes it attractive for very large domains.
Which Method Should You Use
Choose based on constraints:
- use
random.samplewhen available and you do not need custom behavior - use the set-based loop for quick prototypes or very small samples
- use partial Fisher-Yates when
mis a significant fraction ofn - use Floyd's algorithm when
nis very large andmis much smaller
There is no single best algorithm for every case. The best one is the one that matches your size and memory profile.
Common Pitfalls
Using repeated random draws without checking uniqueness does not solve the problem because duplicates will appear.
The set-based retry loop can become slow when m approaches n, because collisions become common.
Allocating an array of length n for a partial shuffle is expensive when n is huge and the sample is tiny.
Poor bounds checking is another common bug. m must not be negative, and it must not exceed n.
If output order matters, remember that sets do not preserve a guaranteed random ordering semantics for your algorithm. Sort or shuffle the result explicitly if you need a specific presentation.
Summary
- The problem is sampling without replacement from
0throughn - 1. - '
random.sampleis the simplest correct solution in Python.' - Repeated draws with a set work, but performance degrades as the sample fills.
- Partial Fisher-Yates is efficient when you can materialize the full range.
- Floyd's algorithm is a strong option when
nis large andmis small.

