Sampling without replacement from a given non-uniform distribution in TensorFlow

TensorFlow

Sampling

Non-uniform Distribution

Machine Learning

Data Science

Sampling without replacement from a given non-uniform distribution in TensorFlow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Sampling without replacement from a weighted distribution means two things at once: heavier items should be more likely to be chosen, and no item may appear twice. TensorFlow does not expose one simple high-level helper for every version of this problem, but there are good practical strategies, especially the Gumbel-top-k trick.

Why This Is Different From Ordinary Weighted Sampling

Ordinary weighted sampling with replacement is easy: each draw is independent, so the same item can appear more than once. Without replacement, each selection changes the remaining distribution.

That is why you cannot just call a categorical sampler k times and expect the result to be correct unless you also remove the chosen item or mask it between draws.

A Practical TensorFlow Approach: Gumbel-Top-k

A standard trick for weighted sampling without replacement is to add Gumbel noise to the log-weights and then take the top k values.

python

1import tensorflow as tf
2
3
4def sample_without_replacement(weights, k, seed=None):
5    weights = tf.convert_to_tensor(weights, dtype=tf.float32)
6    if tf.reduce_any(weights < 0):
7        raise ValueError("weights must be non-negative")
8
9    logits = tf.math.log(weights)
10    gumbels = -tf.math.log(-tf.math.log(tf.random.uniform(tf.shape(logits), seed=seed)))
11    scores = logits + gumbels
12    return tf.math.top_k(scores, k=k).indices
13
14
15weights = tf.constant([0.1, 0.2, 0.5, 0.2], dtype=tf.float32)
16print(sample_without_replacement(weights, k=2).numpy())

This gives distinct indices while still respecting the relative weighting.

Handle Zero Weights Carefully

The simple log(weights) step breaks for zero-weight items because log(0) is negative infinity. That may be fine if those elements should never be sampled, but it should be treated intentionally.

python

1def sample_without_replacement_safe(weights, k, seed=None):
2    weights = tf.convert_to_tensor(weights, dtype=tf.float32)
3    if tf.reduce_any(weights < 0):
4        raise ValueError("weights must be non-negative")
5
6    logits = tf.where(
7        weights > 0,
8        tf.math.log(weights),
9        tf.fill(tf.shape(weights), tf.constant(-1e30, dtype=tf.float32)),
10    )
11
12    gumbels = -tf.math.log(-tf.math.log(tf.random.uniform(tf.shape(logits), seed=seed)))
13    scores = logits + gumbels
14    return tf.math.top_k(scores, k=k).indices

That keeps zero-weight items effectively excluded without producing undefined math.

Iterative Masking Is Simpler to Understand

Another valid approach is to sample one item at a time and mask the chosen index before the next draw.

python

1def sample_iterative(weights, k):
2    weights = tf.identity(tf.cast(weights, tf.float32))
3    chosen = []
4
5    for _ in range(k):
6        logits = tf.math.log(weights[None, :])
7        index = tf.random.categorical(logits, 1)[0, 0]
8        chosen.append(index)
9        weights = tf.tensor_scatter_nd_update(weights, [[index]], [0.0])
10
11    return tf.stack(chosen)

This is easier to explain, but it is more sequential and less elegant than the Gumbel-top-k method.

Validate the Inputs

Weighted sampling functions should validate at least three things:

weights are non-negative
the number of positive-weight items is at least k
the shape is what the caller expects

If there are fewer eligible items than requested samples, “without replacement” is impossible and the function should fail clearly instead of returning a misleading partial result.

Think About the Real Goal

If this sampling happens inside a training loop, ask whether it must be pure TensorFlow graph code or whether preprocessing outside TensorFlow would be simpler. Not every probabilistic data-preparation step belongs inside the model graph.

For small or occasional sampling jobs, a NumPy or Python-side solution may be more readable. For large-batch TensorFlow pipelines, staying inside TensorFlow is often worth it.

Common Pitfalls

The most common mistake is sampling repeatedly from a weighted distribution without removing already selected items.

Another common issue is ignoring zero weights and then running into log(0) problems. Developers also often forget to validate that there are enough positive-weight items to support sampling without replacement for the requested k.

Summary

Weighted sampling without replacement is not the same as repeated weighted draws with replacement.
The Gumbel-top-k trick is a strong TensorFlow-friendly solution.
Zero weights need explicit handling when logarithms are involved.
Iterative masking is simpler conceptually but less elegant for vectorized workflows.
Validate the weight vector and the requested sample count before sampling.