Find Unique values in a 2D Tensor using Tensorflow
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Finding unique values in TensorFlow is straightforward once you understand one important limitation: tf.unique expects a one-dimensional tensor. For a 2D tensor, the usual solution is to flatten the data first, then apply the unique operation to the flattened vector.
Why a 2D Tensor Needs Reshaping First
A 2D tensor stores scalar values in rows and columns, but tf.unique is designed to operate on a flat sequence. If your goal is "which scalar values appear anywhere in this matrix," flattening is the correct transformation because it preserves every element while removing the row and column structure.
The values tensor contains each distinct scalar once, in the order TensorFlow first encountered it. The index_map tensor tells you, for every flattened element, which position in values it corresponds to.
A Reusable Helper Function
For preprocessing code, wrapping the reshape and unique logic in a helper keeps the call site clearer.
This is a good pattern when you are checking labels, extracting categorical IDs, or validating that a matrix only contains allowed values.
Counting Unique Values
Sometimes the distinct values are not enough and you also need frequencies. TensorFlow provides tf.unique_with_counts for that.
This is especially useful in data inspection tasks, where you want to know not only which classes appear in a batch, but also how heavily they are represented.
Unique Scalars vs Unique Rows
This is where many bugs start. Flattening answers the question "which scalar values exist anywhere in the tensor?" It does not answer "which rows are unique?"
For example, if the matrix is:
Flattening and calling tf.unique returns 1, 2, 3, 4, which is correct for scalar uniqueness. But it loses the fact that the row [1, 2] appears twice. If you actually need unique rows, that is a different operation and you should solve it as a row-wise deduplication problem rather than flattening.
Ordering and Sorting
tf.unique preserves first-seen order. That behavior is often useful because it keeps the output stable relative to the original data stream. If you need sorted unique values instead, sort the result explicitly.
That separation is helpful because order-of-appearance and numeric sorting solve different problems.
Common Pitfalls
The most common mistake is calling tf.unique directly on a 2D tensor and expecting TensorFlow to interpret rows or columns automatically. It does not; you need to decide what kind of uniqueness you want first.
Another pitfall is flattening when the actual requirement was row uniqueness. Flattening is correct only when the result should be a set of scalar values, not structured rows.
Developers also assume the result is sorted. tf.unique preserves discovery order, so if sorted output matters, add tf.sort afterward.
Finally, avoid converting tensors to NumPy too early in a TensorFlow data pipeline. That may be fine for debugging, but inside production graph or dataset code it usually adds unnecessary boundary crossings.
Summary
- '
tf.uniqueworks on one-dimensional tensors, so a 2D tensor must usually be flattened first.' - Use
tf.reshape(tensor, [-1])when you want unique scalar values from the entire matrix. - Use
tf.unique_with_countswhen frequencies matter as well as distinct values. - Keep scalar uniqueness separate from row-wise uniqueness; they are different problems.
- Sort the result explicitly if you need numeric or lexical order instead of first-seen order.

