ImageDataGenerator for semantic segmentation

ImageDataGenerator

semantic segmentation

image preprocessing

deep learning

computer vision

ImageDataGenerator for semantic segmentation

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Semantic segmentation needs every training image and its mask to stay perfectly aligned after augmentation. That requirement makes the task slightly different from ordinary image classification, where only the image is transformed and the label stays unchanged.

Why Segmentation Needs a Paired Generator

ImageDataGenerator was designed primarily for image classification, but it can still be used for segmentation if you drive the image and mask generators in lockstep. The main rule is simple: every random transform applied to the image must be applied to the mask with the same random seed and the same geometric parameters.

If the image rotates but the mask does not, the labels stop matching the pixels and training quickly becomes meaningless.

There is a second rule that matters just as much: masks are categorical data, not photographs. That means interpolation choices that look fine for images can corrupt the mask by inventing class values that never existed.

Basic Pattern With ImageDataGenerator

A practical legacy setup is to create two generators with identical augmentation settings and the same seed. One reads images, the other reads masks. Then you zip them together so that each batch contains aligned arrays.

python

1import numpy as np
2from tensorflow.keras.preprocessing.image import ImageDataGenerator
3
4seed = 42
5batch_size = 4
6
7image_datagen = ImageDataGenerator(
8    rotation_range=15,
9    width_shift_range=0.1,
10    height_shift_range=0.1,
11    zoom_range=0.1,
12    horizontal_flip=True,
13    rescale=1.0 / 255.0,
14)
15
16mask_datagen = ImageDataGenerator(
17    rotation_range=15,
18    width_shift_range=0.1,
19    height_shift_range=0.1,
20    zoom_range=0.1,
21    horizontal_flip=True,
22)
23
24images = np.random.randint(0, 255, size=(16, 128, 128, 3), dtype=np.uint8)
25masks = np.random.randint(0, 3, size=(16, 128, 128, 1), dtype=np.uint8)
26
27image_flow = image_datagen.flow(images, batch_size=batch_size, seed=seed)
28mask_flow = mask_datagen.flow(masks, batch_size=batch_size, seed=seed)
29
30train_flow = zip(image_flow, mask_flow)
31
32x_batch, y_batch = next(train_flow)
33print(x_batch.shape, y_batch.shape)

This works because both flows consume random numbers in the same order. If you change one generator configuration and not the other, the synchronization breaks.

Handling Masks Correctly

The code above demonstrates the pairing pattern, but mask handling needs extra care.

For images, rescaling by dividing by 255 is normal. For masks, rescaling is usually wrong because class values such as 0, 1, and 2 must remain discrete labels.

Similarly, brightness shifts, channel shifts, and featurewise normalization should be applied to images only, never to masks. Masks should generally receive only geometric transforms such as flips, translations, or rotations.

When your masks are stored as color images, convert them into integer class maps before training. A segmentation network expects each pixel to represent a class index or a one-hot vector, not arbitrary RGB values.

python

1import tensorflow as tf
2
3mask = tf.constant(
4    [[[0], [1]],
5     [[2], [1]]],
6    dtype=tf.uint8,
7)
8
9one_hot = tf.one_hot(tf.squeeze(mask, axis=-1), depth=3)
10print(one_hot.shape)

That conversion is especially important when your loss function assumes categorical targets.

Limits of ImageDataGenerator

ImageDataGenerator can be useful for older projects, but it is not ideal for every segmentation pipeline. It does not give you fine-grained control over interpolation, paired random state, or complex augmentations such as elastic transforms. It is also part of an older preprocessing style compared with modern tf.data pipelines and preprocessing layers.

For new code, many teams prefer tf.data because it lets you load files, decode masks, apply exactly the same random transform to both tensors, and keep the whole pipeline explicit.

Here is a small example using TensorFlow ops directly:

python

1import tensorflow as tf
2
3
4def augment(image, mask):
5    flip = tf.random.uniform(()) > 0.5
6    if flip:
7        image = tf.image.flip_left_right(image)
8        mask = tf.image.flip_left_right(mask)
9    image = tf.cast(image, tf.float32) / 255.0
10    return image, mask
11
12image = tf.random.uniform((128, 128, 3), maxval=255, dtype=tf.int32)
13mask = tf.random.uniform((128, 128, 1), maxval=3, dtype=tf.int32)
14
15image, mask = augment(image, mask)
16print(image.shape, mask.shape)

The explicit version is longer, but it makes the synchronization rule impossible to forget.

Common Pitfalls

The most common mistake is augmenting images and masks independently. Even small differences in seed or generator order can scramble supervision.

Another frequent issue is normalizing masks like images. If you divide mask labels by 255, class identities are destroyed.

Interpolation is another hidden source of bugs. Rotating or resizing masks with smoothing can create fractional label values. For masks, use nearest-neighbor style behavior whenever possible.

Finally, avoid augmentations that are semantically invalid for the problem. Horizontal flips may help for road scenes, but not necessarily for medical images where left and right have clinical meaning.

Summary

For segmentation, images and masks must receive the same geometric transforms in the same order.
'ImageDataGenerator can work for legacy code if paired flows share the same seed.'
Apply photometric changes to images, not to masks.
Keep masks as discrete class labels and watch interpolation behavior carefully.
Prefer tf.data or explicit TensorFlow augmentations for new pipelines that need more control.