Keras
ImageDataGenerator
error handling
deep learning
data preprocessing

Keras Image data generator throwing no files found error?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

The no files found message from Keras image generators usually means the directory layout or file discovery rules do not match what Keras expects. The model code can be correct, yet training fails before the first batch. Fixing this issue is mostly about validating paths, folder structure, and extensions in a systematic way.

How flow_from_directory Discovers Images

ImageDataGenerator.flow_from_directory expects one folder per class under a root directory. It does not treat the root as a class itself. If class folders are missing, the generator reports zero files.

python
1from tensorflow.keras.preprocessing.image import ImageDataGenerator
2
3train_gen = ImageDataGenerator(rescale=1.0 / 255.0)
4
5train_data = train_gen.flow_from_directory(
6    "data/train",
7    target_size=(224, 224),
8    batch_size=32,
9    class_mode="categorical",
10)
11
12print(train_data.samples)
13print(train_data.class_indices)

Expected layout:

  • data/train/cats/*.jpg
  • data/train/dogs/*.jpg

If images are directly inside data/train, sample count is zero because no class directories exist.

Verify Paths Early

Relative paths often fail when scripts run from a different working directory. Use absolute paths and quick checks before creating the generator.

python
1from pathlib import Path
2
3root = Path("data/train").resolve()
4print("train path:", root)
5print("exists:", root.exists())
6print("class dirs:", [p.name for p in root.iterdir() if p.is_dir()])

This short preflight prevents long debug sessions later.

Confirm Supported Extensions and Corrupt Files

Keras only loads known image formats. If your dataset contains unsupported or corrupted files, sample counts can be lower than expected. Validate files once during dataset preparation.

python
1from pathlib import Path
2from PIL import Image
3
4allowed = {".jpg", ".jpeg", ".png", ".bmp", ".gif"}
5base = Path("data/train")
6
7bad = []
8for f in base.rglob("*"):
9    if f.is_file():
10        if f.suffix.lower() not in allowed:
11            bad.append((str(f), "unsupported extension"))
12            continue
13        try:
14            with Image.open(f) as img:
15                img.verify()
16        except Exception:
17            bad.append((str(f), "corrupt image"))
18
19print("issues:", len(bad))
20for row in bad[:10]:
21    print(row)

Cleaning bad files avoids silent dataset quality problems.

Use flow_from_dataframe for Custom Label Sources

If labels come from CSV or database rows, directory-based loading can be awkward. flow_from_dataframe provides explicit file and label mapping.

python
1import pandas as pd
2from tensorflow.keras.preprocessing.image import ImageDataGenerator
3
4df = pd.DataFrame(
5    {
6        "filename": ["img1.jpg", "img2.jpg"],
7        "label": ["cat", "dog"],
8    }
9)
10
11gen = ImageDataGenerator(rescale=1.0 / 255.0)
12iter_df = gen.flow_from_dataframe(
13    dataframe=df,
14    directory="data/train_flat",
15    x_col="filename",
16    y_col="label",
17    class_mode="categorical",
18    target_size=(224, 224),
19)
20
21print(iter_df.samples)

This reduces dependence on folder naming conventions.

Practical Debug Script

A short standalone script can verify dataset health before any model code runs.

python
1from pathlib import Path
2
3root = Path("data/train")
4counts = {}
5for class_dir in root.iterdir():
6    if class_dir.is_dir():
7        counts[class_dir.name] = len([p for p in class_dir.iterdir() if p.is_file()])
8
9print("class file counts:", counts)
10print("total files:", sum(counts.values()))

Use this in CI or pre-training checks so invalid datasets are caught early.

Debug Checklist Before Training

Run this checklist when the generator reports zero images:

  1. Print resolved directory path.
  2. List class subfolders.
  3. Count files by extension.
  4. Verify image readability.
  5. Print samples and class_indices.

With these checks, root cause is usually visible within minutes.

Batch Sanity Check Before model.fit

A fast dry run helps validate generator behavior before expensive training. Pull one batch and inspect shape, label distribution, and value ranges.

python
1images, labels = next(train_data)
2print(images.shape)
3print(labels.shape)
4print(images.min(), images.max())

If shapes or class counts look wrong, fix data loading first. Training with bad batches can waste hours and produce misleading metrics.

Common Pitfalls

  • Keeping all images in one directory without class subfolders when using flow_from_directory.
  • Passing relative paths while running code from another working directory.
  • Assuming every image file is valid and readable.
  • Mixing hidden system files with real images in dataset folders.
  • Forgetting that wrong class_mode can break training even after files are found.

Summary

  • no files found is typically a dataset discovery issue, not a model architecture issue.
  • flow_from_directory requires class subfolders under the root path.
  • Absolute path checks and directory prints catch most mistakes quickly.
  • Validate extensions and image integrity before training.
  • Use flow_from_dataframe when labels are not encoded in folder names.

Course illustration
Course illustration

All Rights Reserved.