Creating a tensorflow dataset that outputs a dict
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
tf.data.Dataset elements do not have to be plain tensors or tuples. A dataset can yield dictionaries, and that is often the cleanest format when your model has named inputs or when you want feature names to stay attached throughout the pipeline.
Basic Dictionary Dataset
The simplest approach is to pass a dictionary to from_tensor_slices:
Each element is now a dictionary with keys "age" and "income".
Features Plus Labels
For training, you usually want a pair:
- feature dictionary
- label tensor
That looks like this:
This structure works naturally with model.fit.
If your labels are also multi-field, the second element can be a dictionary too. TensorFlow's nested structure support is broad as long as the shape and dtype structure stays consistent.
That flexibility is especially useful in multitask models.
Why Dictionary Outputs Are Useful
Dictionary outputs are helpful when:
- your model has multiple named inputs
- feature order should not be implicit
- you want clearer preprocessing code
Using names instead of positional tuples makes pipelines easier to read and less fragile.
Matching Keras Input Names
If you build a Keras model with named Input layers, dictionary keys can map directly to those input names:
As long as the dataset keys match the input names, Keras can route the tensors correctly.
Creating the Dictionary in map
You can also start from tuples and convert later:
This is useful when your raw source format is positional but the model-facing format should be named.
Inspecting the Structure
When debugging a tf.data pipeline, inspect element_spec:
This quickly shows you the key names, dtypes, and shapes TensorFlow thinks your dataset is producing.
Performance Features Still Apply
Dictionary outputs still work with the normal tf.data operations:
- '
shuffle' - '
batch' - '
map' - '
prefetch'
The structure is richer, but the performance model is the same.
Common Pitfalls
The biggest mistake is using dictionary keys that do not match the names expected by the model inputs.
Another mistake is giving dictionary values with inconsistent first-dimension lengths. from_tensor_slices expects matching leading dimensions across the structure.
People also forget that map functions must return TensorFlow-friendly nested structures. Arbitrary Python objects usually do not belong there.
Finally, inspect element_spec instead of guessing. It is one of the fastest ways to diagnose dataset structure bugs.
Summary
- A
tf.data.Datasetcan yield dictionaries, not just tensors or tuples. - Use
from_tensor_sliceswith a dictionary when your features are already named. - For training, a common pattern is
(feature_dict, label). - Dictionary keys should match Keras input names when feeding a model.
- '
element_specis the quickest way to verify the dataset structure.'

