Creating a tensorflow dataset that outputs a dict

TensorFlow

Dataset

Python

Machine Learning

Data Processing

Creating a tensorflow dataset that outputs a dict

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

tf.data.Dataset elements do not have to be plain tensors or tuples. A dataset can yield dictionaries, and that is often the cleanest format when your model has named inputs or when you want feature names to stay attached throughout the pipeline.

Basic Dictionary Dataset

The simplest approach is to pass a dictionary to from_tensor_slices:

python

1import tensorflow as tf
2
3features = {
4    "age": tf.constant([25, 31, 42], dtype=tf.int32),
5    "income": tf.constant([50000.0, 72000.0, 61000.0], dtype=tf.float32),
6}
7
8dataset = tf.data.Dataset.from_tensor_slices(features)
9
10for item in dataset:
11    print(item)

Each element is now a dictionary with keys "age" and "income".

Features Plus Labels

For training, you usually want a pair:

feature dictionary
label tensor

That looks like this:

python

1import tensorflow as tf
2
3features = {
4    "age": tf.constant([25, 31, 42], dtype=tf.int32),
5    "income": tf.constant([50000.0, 72000.0, 61000.0], dtype=tf.float32),
6}
7labels = tf.constant([0, 1, 0], dtype=tf.int32)
8
9dataset = tf.data.Dataset.from_tensor_slices((features, labels)).batch(2)
10
11for batch_features, batch_labels in dataset:
12    print(batch_features)
13    print(batch_labels)

This structure works naturally with model.fit.

If your labels are also multi-field, the second element can be a dictionary too. TensorFlow's nested structure support is broad as long as the shape and dtype structure stays consistent.

That flexibility is especially useful in multitask models.

Why Dictionary Outputs Are Useful

Dictionary outputs are helpful when:

your model has multiple named inputs
feature order should not be implicit
you want clearer preprocessing code

Using names instead of positional tuples makes pipelines easier to read and less fragile.

Matching Keras Input Names

If you build a Keras model with named Input layers, dictionary keys can map directly to those input names:

python

1age_input = tf.keras.Input(shape=(), name="age", dtype=tf.int32)
2income_input = tf.keras.Input(shape=(), name="income", dtype=tf.float32)
3
4x = tf.cast(age_input, tf.float32) + income_input / 10000.0
5output = tf.keras.layers.Dense(1)(tf.expand_dims(x, -1))
6
7model = tf.keras.Model(
8    inputs={"age": age_input, "income": income_input},
9    outputs=output,
10)

As long as the dataset keys match the input names, Keras can route the tensors correctly.

Creating the Dictionary in `map`

You can also start from tuples and convert later:

python

1raw = tf.data.Dataset.from_tensor_slices((
2    tf.constant([25, 31, 42]),
3    tf.constant([50000.0, 72000.0, 61000.0]),
4    tf.constant([0, 1, 0]),
5))
6
7dataset = raw.map(lambda age, income, label: (
8    {"age": age, "income": income},
9    label,
10))

This is useful when your raw source format is positional but the model-facing format should be named.

Inspecting the Structure

When debugging a tf.data pipeline, inspect element_spec:

python

print(dataset.element_spec)

This quickly shows you the key names, dtypes, and shapes TensorFlow thinks your dataset is producing.

Performance Features Still Apply

Dictionary outputs still work with the normal tf.data operations:

'shuffle'
'batch'
'map'
'prefetch'

The structure is richer, but the performance model is the same.

Common Pitfalls

The biggest mistake is using dictionary keys that do not match the names expected by the model inputs.

Another mistake is giving dictionary values with inconsistent first-dimension lengths. from_tensor_slices expects matching leading dimensions across the structure.

People also forget that map functions must return TensorFlow-friendly nested structures. Arbitrary Python objects usually do not belong there.

Finally, inspect element_spec instead of guessing. It is one of the fastest ways to diagnose dataset structure bugs.

Summary

A tf.data.Dataset can yield dictionaries, not just tensors or tuples.
Use from_tensor_slices with a dictionary when your features are already named.
For training, a common pattern is (feature_dict, label).
Dictionary keys should match Keras input names when feeding a model.
'element_spec is the quickest way to verify the dataset structure.'

Creating a tensorflow dataset that outputs a dict

Master System Design with Codemia

Introduction

Basic Dictionary Dataset

Features Plus Labels

Why Dictionary Outputs Are Useful

Matching Keras Input Names

Creating the Dictionary in map

Inspecting the Structure

Performance Features Still Apply

Common Pitfalls

Summary

Creating the Dictionary in `map`