Tensorflow
2D Tensor
Unique Values
Machine Learning
Data Processing

Find Unique values in a 2D Tensor using Tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Finding unique values in TensorFlow is straightforward once you understand one important limitation: tf.unique expects a one-dimensional tensor. For a 2D tensor, the usual solution is to flatten the data first, then apply the unique operation to the flattened vector.

Why a 2D Tensor Needs Reshaping First

A 2D tensor stores scalar values in rows and columns, but tf.unique is designed to operate on a flat sequence. If your goal is "which scalar values appear anywhere in this matrix," flattening is the correct transformation because it preserves every element while removing the row and column structure.

python
1import tensorflow as tf
2
3matrix = tf.constant([
4    [1, 2, 2],
5    [3, 1, 4],
6    [4, 5, 3],
7])
8
9flat = tf.reshape(matrix, [-1])
10values, index_map = tf.unique(flat)
11
12print("flat:", flat.numpy())
13print("unique:", values.numpy())
14print("index map:", index_map.numpy())

The values tensor contains each distinct scalar once, in the order TensorFlow first encountered it. The index_map tensor tells you, for every flattened element, which position in values it corresponds to.

A Reusable Helper Function

For preprocessing code, wrapping the reshape and unique logic in a helper keeps the call site clearer.

python
1import tensorflow as tf
2
3
4def unique_values_2d(tensor):
5    flat = tf.reshape(tensor, [-1])
6    values, _ = tf.unique(flat)
7    return values
8
9
10matrix = tf.constant([
11    [7, 8, 8],
12    [9, 7, 10],
13])
14
15print(unique_values_2d(matrix).numpy())

This is a good pattern when you are checking labels, extracting categorical IDs, or validating that a matrix only contains allowed values.

Counting Unique Values

Sometimes the distinct values are not enough and you also need frequencies. TensorFlow provides tf.unique_with_counts for that.

python
1import tensorflow as tf
2
3matrix = tf.constant([
4    [1, 1, 2],
5    [2, 3, 1],
6])
7
8flat = tf.reshape(matrix, [-1])
9values, _, counts = tf.unique_with_counts(flat)
10
11print("values:", values.numpy())
12print("counts:", counts.numpy())

This is especially useful in data inspection tasks, where you want to know not only which classes appear in a batch, but also how heavily they are represented.

Unique Scalars vs Unique Rows

This is where many bugs start. Flattening answers the question "which scalar values exist anywhere in the tensor?" It does not answer "which rows are unique?"

For example, if the matrix is:

python
1matrix = tf.constant([
2    [1, 2],
3    [1, 2],
4    [3, 4],
5])

Flattening and calling tf.unique returns 1, 2, 3, 4, which is correct for scalar uniqueness. But it loses the fact that the row [1, 2] appears twice. If you actually need unique rows, that is a different operation and you should solve it as a row-wise deduplication problem rather than flattening.

Ordering and Sorting

tf.unique preserves first-seen order. That behavior is often useful because it keeps the output stable relative to the original data stream. If you need sorted unique values instead, sort the result explicitly.

python
1import tensorflow as tf
2
3matrix = tf.constant([
4    [3, 1],
5    [2, 3],
6])
7
8flat = tf.reshape(matrix, [-1])
9values, _ = tf.unique(flat)
10sorted_values = tf.sort(values)
11
12print(sorted_values.numpy())

That separation is helpful because order-of-appearance and numeric sorting solve different problems.

Common Pitfalls

The most common mistake is calling tf.unique directly on a 2D tensor and expecting TensorFlow to interpret rows or columns automatically. It does not; you need to decide what kind of uniqueness you want first.

Another pitfall is flattening when the actual requirement was row uniqueness. Flattening is correct only when the result should be a set of scalar values, not structured rows.

Developers also assume the result is sorted. tf.unique preserves discovery order, so if sorted output matters, add tf.sort afterward.

Finally, avoid converting tensors to NumPy too early in a TensorFlow data pipeline. That may be fine for debugging, but inside production graph or dataset code it usually adds unnecessary boundary crossings.

Summary

  • 'tf.unique works on one-dimensional tensors, so a 2D tensor must usually be flattened first.'
  • Use tf.reshape(tensor, [-1]) when you want unique scalar values from the entire matrix.
  • Use tf.unique_with_counts when frequencies matter as well as distinct values.
  • Keep scalar uniqueness separate from row-wise uniqueness; they are different problems.
  • Sort the result explicitly if you need numeric or lexical order instead of first-seen order.

Course illustration
Course illustration

All Rights Reserved.