Keras Image data generator throwing no files found error?

Keras

ImageDataGenerator

error handling

deep learning

data preprocessing

Keras Image data generator throwing no files found error?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

The no files found message from Keras image generators usually means the directory layout or file discovery rules do not match what Keras expects. The model code can be correct, yet training fails before the first batch. Fixing this issue is mostly about validating paths, folder structure, and extensions in a systematic way.

How `flow_from_directory` Discovers Images

ImageDataGenerator.flow_from_directory expects one folder per class under a root directory. It does not treat the root as a class itself. If class folders are missing, the generator reports zero files.

python

1from tensorflow.keras.preprocessing.image import ImageDataGenerator
2
3train_gen = ImageDataGenerator(rescale=1.0 / 255.0)
4
5train_data = train_gen.flow_from_directory(
6    "data/train",
7    target_size=(224, 224),
8    batch_size=32,
9    class_mode="categorical",
10)
11
12print(train_data.samples)
13print(train_data.class_indices)

Expected layout:

data/train/cats/*.jpg
data/train/dogs/*.jpg

If images are directly inside data/train, sample count is zero because no class directories exist.

Verify Paths Early

Relative paths often fail when scripts run from a different working directory. Use absolute paths and quick checks before creating the generator.

python

1from pathlib import Path
2
3root = Path("data/train").resolve()
4print("train path:", root)
5print("exists:", root.exists())
6print("class dirs:", [p.name for p in root.iterdir() if p.is_dir()])

This short preflight prevents long debug sessions later.

Confirm Supported Extensions and Corrupt Files

Keras only loads known image formats. If your dataset contains unsupported or corrupted files, sample counts can be lower than expected. Validate files once during dataset preparation.

python

1from pathlib import Path
2from PIL import Image
3
4allowed = {".jpg", ".jpeg", ".png", ".bmp", ".gif"}
5base = Path("data/train")
6
7bad = []
8for f in base.rglob("*"):
9    if f.is_file():
10        if f.suffix.lower() not in allowed:
11            bad.append((str(f), "unsupported extension"))
12            continue
13        try:
14            with Image.open(f) as img:
15                img.verify()
16        except Exception:
17            bad.append((str(f), "corrupt image"))
18
19print("issues:", len(bad))
20for row in bad[:10]:
21    print(row)

Cleaning bad files avoids silent dataset quality problems.

Use `flow_from_dataframe` for Custom Label Sources

If labels come from CSV or database rows, directory-based loading can be awkward. flow_from_dataframe provides explicit file and label mapping.

python

1import pandas as pd
2from tensorflow.keras.preprocessing.image import ImageDataGenerator
3
4df = pd.DataFrame(
5    {
6        "filename": ["img1.jpg", "img2.jpg"],
7        "label": ["cat", "dog"],
8    }
9)
10
11gen = ImageDataGenerator(rescale=1.0 / 255.0)
12iter_df = gen.flow_from_dataframe(
13    dataframe=df,
14    directory="data/train_flat",
15    x_col="filename",
16    y_col="label",
17    class_mode="categorical",
18    target_size=(224, 224),
19)
20
21print(iter_df.samples)

This reduces dependence on folder naming conventions.

Practical Debug Script

A short standalone script can verify dataset health before any model code runs.

python

1from pathlib import Path
2
3root = Path("data/train")
4counts = {}
5for class_dir in root.iterdir():
6    if class_dir.is_dir():
7        counts[class_dir.name] = len([p for p in class_dir.iterdir() if p.is_file()])
8
9print("class file counts:", counts)
10print("total files:", sum(counts.values()))

Use this in CI or pre-training checks so invalid datasets are caught early.

Debug Checklist Before Training

Run this checklist when the generator reports zero images:

Print resolved directory path.
List class subfolders.
Count files by extension.
Verify image readability.
Print samples and class_indices.

With these checks, root cause is usually visible within minutes.

Batch Sanity Check Before `model.fit`

A fast dry run helps validate generator behavior before expensive training. Pull one batch and inspect shape, label distribution, and value ranges.

python

1images, labels = next(train_data)
2print(images.shape)
3print(labels.shape)
4print(images.min(), images.max())

If shapes or class counts look wrong, fix data loading first. Training with bad batches can waste hours and produce misleading metrics.

Common Pitfalls

Keeping all images in one directory without class subfolders when using flow_from_directory.
Passing relative paths while running code from another working directory.
Assuming every image file is valid and readable.
Mixing hidden system files with real images in dataset folders.
Forgetting that wrong class_mode can break training even after files are found.

Summary

no files found is typically a dataset discovery issue, not a model architecture issue.
flow_from_directory requires class subfolders under the root path.
Absolute path checks and directory prints catch most mistakes quickly.
Validate extensions and image integrity before training.
Use flow_from_dataframe when labels are not encoded in folder names.

Keras Image data generator throwing no files found error?

Master System Design with Codemia

Introduction

How flow_from_directory Discovers Images

Verify Paths Early

Confirm Supported Extensions and Corrupt Files

Use flow_from_dataframe for Custom Label Sources

Practical Debug Script

Debug Checklist Before Training

Batch Sanity Check Before model.fit

Common Pitfalls

Summary

How `flow_from_directory` Discovers Images

Use `flow_from_dataframe` for Custom Label Sources

Batch Sanity Check Before `model.fit`