Sampling without replacement from a given non-uniform distribution in TensorFlow
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Sampling without replacement from a weighted distribution means two things at once: heavier items should be more likely to be chosen, and no item may appear twice. TensorFlow does not expose one simple high-level helper for every version of this problem, but there are good practical strategies, especially the Gumbel-top-k trick.
Why This Is Different From Ordinary Weighted Sampling
Ordinary weighted sampling with replacement is easy: each draw is independent, so the same item can appear more than once. Without replacement, each selection changes the remaining distribution.
That is why you cannot just call a categorical sampler k times and expect the result to be correct unless you also remove the chosen item or mask it between draws.
A Practical TensorFlow Approach: Gumbel-Top-k
A standard trick for weighted sampling without replacement is to add Gumbel noise to the log-weights and then take the top k values.
This gives distinct indices while still respecting the relative weighting.
Handle Zero Weights Carefully
The simple log(weights) step breaks for zero-weight items because log(0) is negative infinity. That may be fine if those elements should never be sampled, but it should be treated intentionally.
That keeps zero-weight items effectively excluded without producing undefined math.
Iterative Masking Is Simpler to Understand
Another valid approach is to sample one item at a time and mask the chosen index before the next draw.
This is easier to explain, but it is more sequential and less elegant than the Gumbel-top-k method.
Validate the Inputs
Weighted sampling functions should validate at least three things:
- weights are non-negative
- the number of positive-weight items is at least
k - the shape is what the caller expects
If there are fewer eligible items than requested samples, “without replacement” is impossible and the function should fail clearly instead of returning a misleading partial result.
Think About the Real Goal
If this sampling happens inside a training loop, ask whether it must be pure TensorFlow graph code or whether preprocessing outside TensorFlow would be simpler. Not every probabilistic data-preparation step belongs inside the model graph.
For small or occasional sampling jobs, a NumPy or Python-side solution may be more readable. For large-batch TensorFlow pipelines, staying inside TensorFlow is often worth it.
Common Pitfalls
The most common mistake is sampling repeatedly from a weighted distribution without removing already selected items.
Another common issue is ignoring zero weights and then running into log(0) problems. Developers also often forget to validate that there are enough positive-weight items to support sampling without replacement for the requested k.
Summary
- Weighted sampling without replacement is not the same as repeated weighted draws with replacement.
- The Gumbel-top-k trick is a strong TensorFlow-friendly solution.
- Zero weights need explicit handling when logarithms are involved.
- Iterative masking is simpler conceptually but less elegant for vectorized workflows.
- Validate the weight vector and the requested sample count before sampling.

