TensorFlow
feature_column.embedding_column
keras.layers.Embedding
neural networks
deep learning

Difference between feature_column.embedding_column and keras.layers.Embedding in TensorFlow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

TensorFlow provides two distinct mechanisms for converting categorical data into dense vector representations: tf.feature_column.embedding_column and tf.keras.layers.Embedding. Both create learnable embedding matrices, but they belong to different API paradigms and are designed for different workflows. Choosing the wrong one can lead to unnecessary complexity or integration headaches.

This article explains what each one does, when to use it, and provides working code examples for both.

What tf.feature_column.embedding_column Does

tf.feature_column.embedding_column is part of TensorFlow's feature columns API. Feature columns were designed to work with tf.estimator, TensorFlow's high-level training API for structured/tabular data. An embedding column wraps a categorical column and maps each category to a dense vector.

How It Works

You first define a categorical column (either with a known vocabulary or with hash buckets), then wrap it in an embedding column that specifies the embedding dimension.

python
1import tensorflow as tf
2
3# Define a categorical column
4category_col = tf.feature_column.categorical_column_with_vocabulary_list(
5    key="color",
6    vocabulary_list=["red", "blue", "green", "yellow"]
7)
8
9# Wrap it in an embedding column
10embedding_col = tf.feature_column.embedding_column(
11    categorical_column=category_col,
12    dimension=8
13)
14
15# Use with a DenseFeatures layer or an Estimator
16feature_layer = tf.keras.layers.DenseFeatures([embedding_col])
17sample_input = {"color": tf.constant(["red", "blue", "green"])}
18output = feature_layer(sample_input)
19print(output.shape)  # (3, 8)

The key characteristics:

  • Input is a dictionary of feature name to tensor, which matches the tf.estimator input function pattern.
  • It handles the vocabulary lookup internally. You pass raw string values, and the column manages the mapping to integer indices.
  • The embedding matrix is created and trained as part of the model's variables.

What tf.keras.layers.Embedding Does

tf.keras.layers.Embedding is a standard Keras layer that maps integer indices to dense vectors. It is designed for sequential and functional Keras model architectures.

python
1import tensorflow as tf
2
3# Create an embedding layer
4# input_dim = vocabulary size, output_dim = embedding dimension
5embedding_layer = tf.keras.layers.Embedding(
6    input_dim=1000,
7    output_dim=64,
8    input_length=10
9)
10
11# Input must be integer indices
12sample_input = tf.constant([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
13output = embedding_layer(sample_input)
14print(output.shape)  # (1, 10, 64)

Key characteristics:

  • Input must be pre-encoded as integer indices. The layer does not handle vocabulary lookup.
  • It is designed for sequence data where each input is a fixed-length sequence of token IDs.
  • It integrates naturally with other Keras layers like LSTM, Dense, and Conv1D.
  • The output has an extra sequence dimension, making it suitable for NLP tasks.

Side-by-Side Comparison

Here is a practical example showing both approaches on the same task: embedding a product category.

Using feature_column.embedding_column

python
1import tensorflow as tf
2
3category = tf.feature_column.categorical_column_with_vocabulary_list(
4    "product_type", ["electronics", "clothing", "food", "furniture"]
5)
6embedded = tf.feature_column.embedding_column(category, dimension=4)
7
8feature_layer = tf.keras.layers.DenseFeatures([embedded])
9inputs = {"product_type": tf.constant(["electronics", "food"])}
10result = feature_layer(inputs)
11print(result.shape)  # (2, 4)

Using keras.layers.Embedding

python
1import tensorflow as tf
2
3# Must map categories to integers first
4category_to_id = {"electronics": 0, "clothing": 1, "food": 2, "furniture": 3}
5
6embedding = tf.keras.layers.Embedding(input_dim=4, output_dim=4)
7inputs = tf.constant([category_to_id["electronics"], category_to_id["food"]])
8result = embedding(inputs)
9print(result.shape)  # (2, 4)

The feature_column approach handles the string-to-integer mapping for you, while the Keras layer expects you to do it yourself.

When to Use Each

Use feature_column.embedding_column when:

  • You are working with tf.estimator pipelines.
  • Your data is structured or tabular with many categorical features.
  • You want the column API to manage vocabulary lookup and preprocessing.
  • You are building a wide-and-deep model with tf.estimator.DNNLinearCombinedClassifier.

Use tf.keras.layers.Embedding when:

  • You are building models with the Keras Sequential or Functional API.
  • Your input data is already tokenized into integer sequences (common in NLP).
  • You need the embedding output to feed into recurrent or convolutional layers.
  • You are using eager execution and want straightforward debugging.

Migration Note

The tf.estimator API and feature columns are considered legacy in TensorFlow 2.x. Google recommends migrating to Keras-based workflows. If you are starting a new project, tf.keras.layers.Embedding combined with tf.keras.layers.StringLookup (for vocabulary mapping) is the modern approach.

python
1# Modern approach: StringLookup + Embedding
2lookup = tf.keras.layers.StringLookup(
3    vocabulary=["electronics", "clothing", "food", "furniture"]
4)
5embedding = tf.keras.layers.Embedding(input_dim=5, output_dim=4)  # +1 for OOV
6
7inputs = tf.constant(["electronics", "food"])
8indices = lookup(inputs)
9result = embedding(indices)

Common Pitfalls

  1. Passing strings to Keras Embedding. tf.keras.layers.Embedding only accepts integer inputs. Passing string tensors raises a type error. Use StringLookup or a manual mapping first.
  2. Off-by-one in input_dim. If your vocabulary has 4 items and you use StringLookup, it adds an out-of-vocabulary (OOV) token at index 0. Set input_dim to vocabulary size plus 1.
  3. Mixing feature columns with Keras functional models. While DenseFeatures can bridge feature columns into Keras, it adds complexity. For new projects, stick entirely with Keras layers.
  4. Ignoring the sequence dimension. keras.layers.Embedding outputs a 3D tensor (batch, sequence_length, embedding_dim). If you need a 2D output, apply tf.keras.layers.Flatten() or tf.keras.layers.GlobalAveragePooling1D() after the embedding.

Summary

tf.feature_column.embedding_column and tf.keras.layers.Embedding both produce learned dense representations from categorical data, but they serve different APIs. Feature columns work with the Estimator API and handle vocabulary management internally. Keras Embedding works with integer-encoded inputs and integrates with the broader Keras layer ecosystem. For new TensorFlow projects, the Keras Embedding layer combined with preprocessing layers like StringLookup is the recommended path forward.


Course illustration
Course illustration

All Rights Reserved.