How can I generate sorted uniformly distributed random numbers efficiently in C?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Generating a large sequence of sorted uniform random numbers is a common task in simulations, Monte Carlo methods, and statistical sampling. The naive approach of generating all values and then sorting them works, but it costs O(n log n) time for the sort step. When n is in the millions, that sorting overhead becomes a bottleneck. There is a mathematically elegant O(n) technique based on exponential spacings that produces sorted uniform random numbers directly, skipping the sort entirely.
The Naive Approach: Generate Then Sort
The straightforward method generates n uniform random numbers and sorts them afterward:
This runs in O(n log n) due to the sort. For small n (under a few thousand), this is perfectly fine. But for large n, we can do better.
The Exponential Spacings Method: O(n) Generation
This technique exploits a property of order statistics. If you take n uniform random variables on [0, 1] and sort them, the gaps between consecutive values follow a specific distribution. Specifically, the gaps are proportional to exponential random variables.
The algorithm works as follows. Generate n + 1 exponential random variables, compute their cumulative sum, then divide each partial sum by the total. The result is n sorted uniform values on [0, 1]:
This runs in O(n) time with a single pass to generate spacings and a single pass to compute the cumulative sums. No sorting is needed because the cumulative sum is monotonically increasing by construction.
Why This Works: The Mathematical Intuition
Consider n + 1 points placed uniformly on a circle of circumference 1. The arc lengths between consecutive points are symmetric Dirichlet-distributed, which is equivalent to normalized exponential spacings. When you unwrap the circle into a line segment [0, 1], the cumulative spacings give you the sorted order statistics of a uniform distribution.
The key insight is that generating exponential variates and normalizing them is mathematically identical to generating uniform variates and sorting them, but computationally cheaper.
C++ Version with the Standard Random Library
For C++ projects, use the standard library's random facilities for better quality random numbers:
Using std::exponential_distribution directly is cleaner and more numerically robust than manually computing -log(U), since it handles edge cases around zero properly.
Benchmark Comparison
Here is a simple benchmark to compare the two approaches:
Typical results on modern hardware show that for n = 10 million, the exponential method is roughly 3 to 5 times faster than generate-then-sort, and the gap widens as n increases because O(n) versus O(n log n) diverges.
Memory Optimization
The basic exponential method requires O(n) extra memory for the spacings array. You can reduce this to O(1) extra memory with a two-pass approach. First compute the total, then regenerate the spacings with the same seed:
This trades one extra pass over the RNG for eliminating the spacings array, which matters when n is so large that memory is the constraint.
Common Pitfalls
- Using
rand()for serious work: The Crand()function has poor statistical properties and limited range (often only 15 bits). Usestd::mt19937or another quality generator for simulations. - Forgetting the +1 in spacings count: You need n + 1 exponential variates to produce n sorted uniforms. Using exactly n spacings produces values that do not cover [0, 1] correctly.
- Division by zero when U is 0: Computing
-log(U)where U isrand() / RAND_MAXcan produce infinity whenrand()returns 0. Always shift the range slightly or use a library distribution that handles this. - Assuming uniform quality for the sorted output: The exponential method produces mathematically exact uniform order statistics, but the quality depends on your underlying RNG. A poor generator produces poor sorted uniforms regardless of the method.
- Ignoring numerical precision for very large n: When n exceeds roughly 10 to 100 million, accumulated floating-point rounding in the cumulative sum can degrade the uniformity of the last few values. Using compensated (Kahan) summation mitigates this.
Summary
- The naive generate-then-sort approach runs in O(n log n) and is fine for small n.
- The exponential spacings method generates sorted uniform random numbers in O(n) by exploiting the relationship between exponential variates and uniform order statistics.
- Use
std::exponential_distributionin C++ for numerically robust exponential variate generation. - A two-pass approach can reduce extra memory usage from O(n) to O(1) at the cost of regenerating the random sequence.
- For large n (millions or more), the exponential method is 3 to 5 times faster and the advantage grows with n.
- Always use a quality random number generator like Mersenne Twister rather than
rand()for any serious numerical work.

