TensorFlow
tf.nn.embedding_lookup_sparse
machine learning
Python
embeddings

How to use tf.nn.embedding_lookup_sparse in TensorFlow?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

tf.nn.embedding_lookup_sparse is useful when each example contains a variable number of token or feature ids. Instead of converting every example into a dense multi-hot vector, you pass sparse ids and TensorFlow gathers only the embedding rows that matter. That keeps the input representation compact and makes the model code closer to how recommendation and text features are often stored.

Model the Input as a SparseTensor

The function expects sp_ids as a SparseTensor. Each row represents one example in the batch, and each nonzero entry stores a feature id to look up in the embedding table.

python
1import tensorflow as tf
2
3embeddings = tf.constant([
4    [1.0, 0.0],
5    [0.0, 1.0],
6    [1.0, 1.0],
7    [2.0, 0.5],
8], dtype=tf.float32)
9
10sp_ids = tf.sparse.SparseTensor(
11    indices=[[0, 0], [0, 1], [1, 0]],
12    values=[0, 2, 1],
13    dense_shape=[2, 2],
14)
15
16pooled = tf.nn.embedding_lookup_sparse(
17    params=embeddings,
18    sp_ids=sp_ids,
19    sp_weights=None,
20    combiner="mean",
21)
22
23print(pooled.numpy())

In this example, batch item 0 contains ids 0 and 2, while batch item 1 contains only id 1. With combiner="mean", TensorFlow gathers the relevant embedding rows for each example and averages them into one dense vector per row.

The important detail is the row structure. indices does not describe the embedding table. It describes where ids appear in the sparse batch input.

Add Feature Weights When the Features Are Not Equal

Some models need weighted sparse features. Typical cases are term frequencies, confidence scores, or normalized counts. In that case, pass a second SparseTensor in sp_weights with the same indices and dense_shape as sp_ids.

python
1import tensorflow as tf
2
3embeddings = tf.constant([
4    [1.0, 0.0],
5    [0.0, 1.0],
6    [1.0, 1.0],
7], dtype=tf.float32)
8
9sp_ids = tf.sparse.SparseTensor(
10    indices=[[0, 0], [0, 1], [1, 0]],
11    values=[0, 2, 1],
12    dense_shape=[2, 2],
13)
14
15sp_weights = tf.sparse.SparseTensor(
16    indices=[[0, 0], [0, 1], [1, 0]],
17    values=[0.25, 0.75, 1.0],
18    dense_shape=[2, 2],
19)
20
21pooled = tf.nn.embedding_lookup_sparse(
22    params=embeddings,
23    sp_ids=sp_ids,
24    sp_weights=sp_weights,
25    combiner="sum",
26)
27
28print(pooled.numpy())

Here the first example gives more influence to id 2 than id 0. If you omit sp_weights, all ids contribute equally.

Choose the Right Combiner

The combiner argument controls how multiple embeddings from one row are reduced into a single output vector.

  • 'sum: add the selected vectors directly.'
  • 'mean: divide by the number of features or effective weight total.'
  • 'sqrtn: divide by the square root of the summed squared weights, which is often used to stabilize scaling.'

If you want a document embedding that does not grow with token count, mean is often the simplest choice. If raw counts should matter, sum may be appropriate. sqrtn is a compromise that reduces the impact of examples with many active features without making them identical to short examples.

Understand What the Output Shape Means

The output is always dense. If the embedding table has shape vocab_size x embedding_dim, and the sparse batch contains batch_size rows, then the result has shape batch_size x embedding_dim.

That makes the function especially convenient when the next layer expects one dense vector per example. You do not need to manually gather rows, pad variable-length features, or write a custom reduction loop.

It is also worth noting that this operation is a pooled lookup. If you need the embedding for every token position separately, use a regular embedding lookup on a dense or ragged sequence input instead.

When This Function Fits Best

embedding_lookup_sparse is a good fit when features are unordered or naturally represented as bags of ids. Common examples are ad clicks, user interests, tags, or words in a bag-of-words model.

If order matters, such as in a transformer or recurrent model, this is usually not the right tool. Pooling several ids into one vector discards positional structure by design.

Common Pitfalls

  • Building sp_ids with the wrong row layout, so ids from different examples collapse into one batch row.
  • Passing sp_weights with mismatched indices or dense_shape.
  • Expecting token order to be preserved even though the operation pools embeddings into one vector per example.
  • Choosing sum and then wondering why long examples produce much larger magnitudes.
  • Forgetting that sp_ids.values must be integer ids that are valid row indices in the embedding table.

Summary

  • 'tf.nn.embedding_lookup_sparse is for pooled embedding lookup from sparse batch features.'
  • Represent feature ids as a SparseTensor, with one row per example.
  • Use sp_weights when different ids should contribute unequally.
  • Pick mean, sum, or sqrtn based on how you want feature count to affect the output.
  • Use this operation for bag-style sparse features, not sequence models where order matters.

Course illustration
Course illustration

All Rights Reserved.