How to feed input with changing size in Tensorflow

TensorFlow

dynamic input size

machine learning

neural networks

data preprocessing

How to feed input with changing size in Tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Variable-size input is common in text, audio, and vision tasks, but many TensorFlow layers expect fixed-shape batches. The solution is not one technique but a combination of ragged tensors, padding, masking, and shape-aware model design. This guide covers practical ways to feed changing-size data safely.

Option 1: Use Padded Batches

A common approach is padding each batch to the maximum sequence length in that batch.

python

1import tensorflow as tf
2
3sequences = [
4    [1, 2, 3],
5    [4, 5],
6    [6, 7, 8, 9],
7]
8labels = [0, 1, 0]
9
10ds = tf.data.Dataset.from_tensor_slices((sequences, labels))
11
12ds = ds.padded_batch(
13    batch_size=2,
14    padded_shapes=([None], []),
15    padding_values=(0, 0),
16)
17
18for x, y in ds.take(2):
19    print(x)
20    print(y)

Within each batch, lengths are uniform after padding, which keeps tensor shapes valid for most layers.

Add Masking So Model Ignores Padding

When padding tokens are not real data, use masking-aware layers.

python

1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Embedding(input_dim=1000, output_dim=32, mask_zero=True),
5    tf.keras.layers.LSTM(32),
6    tf.keras.layers.Dense(2, activation="softmax"),
7])
8
9model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

mask_zero=True ensures padded zeros do not influence sequence learning.

Option 2: Ragged Tensors for Native Variable Length

TensorFlow supports ragged tensors, which represent uneven inner dimensions without padding.

python

1import tensorflow as tf
2
3rt = tf.ragged.constant([
4    [1, 2, 3],
5    [4, 5],
6    [6, 7, 8, 9],
7])
8
9print(rt)
10print(rt.shape)

Some Keras layers can consume ragged inputs directly, especially in recent TensorFlow versions. If a layer does not support ragged tensors, convert to dense with padding before that layer.

Option 3: Variable Image Sizes

For images with varying resolution, common strategies are:

resize all images to a fixed shape
use random crop or center crop then resize
use models with global pooling to reduce strict shape dependence

python

1import tensorflow as tf
2
3
4def preprocess(path):
5    image = tf.io.read_file(path)
6    image = tf.image.decode_jpeg(image, channels=3)
7    image = tf.image.resize(image, [224, 224])
8    image = image / 255.0
9    return image

Even when original image sizes vary, consistent model input shape is restored after preprocessing.

Dynamic Signatures in `tf.data`

When creating datasets from generators, set flexible dimensions with TensorSpec.

python

1import tensorflow as tf
2import numpy as np
3
4
5def gen():
6    rng = np.random.default_rng(5)
7    for length in [3, 5, 2, 7]:
8        x = rng.integers(1, 50, size=(length,), dtype=np.int32)
9        y = np.int32(length % 2)
10        yield x, y
11
12output_signature = (
13    tf.TensorSpec(shape=(None,), dtype=tf.int32),
14    tf.TensorSpec(shape=(), dtype=tf.int32),
15)
16
17ds = tf.data.Dataset.from_generator(gen, output_signature=output_signature)
18ds = ds.padded_batch(2, padded_shapes=([None], []), padding_values=(0, 0))

This allows changing sequence lengths while preserving schema consistency.

Model Design for Variable Length

A robust variable-input pipeline usually combines:

flexible dataset signature
padding or ragged handling
masking-aware sequence layers
pooling layers for variable feature-map sizes

For text classification, a reliable stack is embedding with mask, recurrent or transformer block, then global pooling and dense output.

For variable-size image tasks, convolution plus global average pooling avoids strict flatten dimensions tied to one fixed resolution.

Debugging Shape Errors

Most failures occur at batch boundaries. Debug systematically:

print one batch shapes before training
inspect model input spec
verify label shapes match loss expectations
check ragged to dense conversion points

Small shape mismatches can produce opaque graph errors, so early batch inspection saves time.

Common Pitfalls

Feeding variable-length lists directly to layers that require dense fixed-rank tensors.
Padding inputs but forgetting masking, causing model to learn padding artifacts.
Mixing ragged and non-ragged layers without explicit conversion.
Setting overly rigid TensorSpec shapes in generator-based pipelines.
Ignoring batch-level shape inspection before calling model.fit.

Summary

Variable-size inputs in TensorFlow are handled with padding, ragged tensors, and masking.
padded_batch is the most broadly compatible workflow.
Ragged tensors are powerful when downstream layers support them.
Shape-flexible preprocessing is essential for mixed-size image and sequence data.
Validate shapes and masks early to prevent hard-to-debug training errors.

How to feed input with changing size in Tensorflow

Master System Design with Codemia

Introduction

Option 1: Use Padded Batches

Add Masking So Model Ignores Padding

Option 2: Ragged Tensors for Native Variable Length

Option 3: Variable Image Sizes

Dynamic Signatures in tf.data

Model Design for Variable Length

Debugging Shape Errors

Common Pitfalls

Summary

Dynamic Signatures in `tf.data`