TensorFlow
dynamic input size
machine learning
neural networks
data preprocessing

How to feed input with changing size in Tensorflow

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Variable-size input is common in text, audio, and vision tasks, but many TensorFlow layers expect fixed-shape batches. The solution is not one technique but a combination of ragged tensors, padding, masking, and shape-aware model design. This guide covers practical ways to feed changing-size data safely.

Option 1: Use Padded Batches

A common approach is padding each batch to the maximum sequence length in that batch.

python
1import tensorflow as tf
2
3sequences = [
4    [1, 2, 3],
5    [4, 5],
6    [6, 7, 8, 9],
7]
8labels = [0, 1, 0]
9
10ds = tf.data.Dataset.from_tensor_slices((sequences, labels))
11
12ds = ds.padded_batch(
13    batch_size=2,
14    padded_shapes=([None], []),
15    padding_values=(0, 0),
16)
17
18for x, y in ds.take(2):
19    print(x)
20    print(y)

Within each batch, lengths are uniform after padding, which keeps tensor shapes valid for most layers.

Add Masking So Model Ignores Padding

When padding tokens are not real data, use masking-aware layers.

python
1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Embedding(input_dim=1000, output_dim=32, mask_zero=True),
5    tf.keras.layers.LSTM(32),
6    tf.keras.layers.Dense(2, activation="softmax"),
7])
8
9model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

mask_zero=True ensures padded zeros do not influence sequence learning.

Option 2: Ragged Tensors for Native Variable Length

TensorFlow supports ragged tensors, which represent uneven inner dimensions without padding.

python
1import tensorflow as tf
2
3rt = tf.ragged.constant([
4    [1, 2, 3],
5    [4, 5],
6    [6, 7, 8, 9],
7])
8
9print(rt)
10print(rt.shape)

Some Keras layers can consume ragged inputs directly, especially in recent TensorFlow versions. If a layer does not support ragged tensors, convert to dense with padding before that layer.

Option 3: Variable Image Sizes

For images with varying resolution, common strategies are:

  • resize all images to a fixed shape
  • use random crop or center crop then resize
  • use models with global pooling to reduce strict shape dependence
python
1import tensorflow as tf
2
3
4def preprocess(path):
5    image = tf.io.read_file(path)
6    image = tf.image.decode_jpeg(image, channels=3)
7    image = tf.image.resize(image, [224, 224])
8    image = image / 255.0
9    return image

Even when original image sizes vary, consistent model input shape is restored after preprocessing.

Dynamic Signatures in tf.data

When creating datasets from generators, set flexible dimensions with TensorSpec.

python
1import tensorflow as tf
2import numpy as np
3
4
5def gen():
6    rng = np.random.default_rng(5)
7    for length in [3, 5, 2, 7]:
8        x = rng.integers(1, 50, size=(length,), dtype=np.int32)
9        y = np.int32(length % 2)
10        yield x, y
11
12output_signature = (
13    tf.TensorSpec(shape=(None,), dtype=tf.int32),
14    tf.TensorSpec(shape=(), dtype=tf.int32),
15)
16
17ds = tf.data.Dataset.from_generator(gen, output_signature=output_signature)
18ds = ds.padded_batch(2, padded_shapes=([None], []), padding_values=(0, 0))

This allows changing sequence lengths while preserving schema consistency.

Model Design for Variable Length

A robust variable-input pipeline usually combines:

  • flexible dataset signature
  • padding or ragged handling
  • masking-aware sequence layers
  • pooling layers for variable feature-map sizes

For text classification, a reliable stack is embedding with mask, recurrent or transformer block, then global pooling and dense output.

For variable-size image tasks, convolution plus global average pooling avoids strict flatten dimensions tied to one fixed resolution.

Debugging Shape Errors

Most failures occur at batch boundaries. Debug systematically:

  • print one batch shapes before training
  • inspect model input spec
  • verify label shapes match loss expectations
  • check ragged to dense conversion points

Small shape mismatches can produce opaque graph errors, so early batch inspection saves time.

Common Pitfalls

  • Feeding variable-length lists directly to layers that require dense fixed-rank tensors.
  • Padding inputs but forgetting masking, causing model to learn padding artifacts.
  • Mixing ragged and non-ragged layers without explicit conversion.
  • Setting overly rigid TensorSpec shapes in generator-based pipelines.
  • Ignoring batch-level shape inspection before calling model.fit.

Summary

  • Variable-size inputs in TensorFlow are handled with padding, ragged tensors, and masking.
  • padded_batch is the most broadly compatible workflow.
  • Ragged tensors are powerful when downstream layers support them.
  • Shape-flexible preprocessing is essential for mixed-size image and sequence data.
  • Validate shapes and masks early to prevent hard-to-debug training errors.

Course illustration
Course illustration

All Rights Reserved.