Keras
machine learning
neural networks
dataset
training issue

Keras not training on entire dataset

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

If Keras appears not to train on the full dataset, the problem is usually not that model.fit() randomly ignores samples. The usual causes are configuration choices around steps_per_epoch, generators, repeated datasets, partial batches, or input pipelines that do not actually expose every example.

Start with How the Data Enters fit()

Keras behaves differently depending on the input type:

  • NumPy arrays
  • Python generators
  • 'tf.data.Dataset'
  • sequence objects such as keras.utils.Sequence

With plain NumPy arrays, fit() typically knows the dataset length and will iterate through all samples each epoch. Problems are more common when you use generators or datasets, because then Keras depends on the pipeline configuration to know how much data constitutes one epoch.

steps_per_epoch Is the First Thing to Check

If you pass a generator or dataset and set steps_per_epoch too small, training stops the epoch early even though more samples exist.

python
1history = model.fit(
2    train_dataset,
3    epochs=5,
4    steps_per_epoch=100,
5)

That means exactly 100 batches are consumed per epoch. If the dataset contains 120 batches, the last 20 are not seen in that epoch.

The fix is to set steps_per_epoch correctly or let Keras infer it when possible.

tf.data.Dataset Can Repeat Forever

Another common issue is a dataset pipeline that includes .repeat() without careful epoch boundaries:

python
train_dataset = raw_dataset.shuffle(1000).batch(32).repeat()

This dataset is infinite. Keras now needs steps_per_epoch because there is no natural end. If the number is wrong, your notion of "one epoch over the full dataset" no longer matches what the training loop is doing.

A safer finite pipeline looks like this:

python
train_dataset = raw_dataset.shuffle(1000).batch(32)
model.fit(train_dataset, epochs=5)

If you do need .repeat(), calculate the step count deliberately.

Partial Batches and drop_remainder

Some pipelines discard the last incomplete batch. In tf.data, that can happen if drop_remainder=True is enabled:

python
train_dataset = raw_dataset.batch(32, drop_remainder=True)

If the dataset size is not divisible by 32, the remaining examples are dropped each epoch. That is not necessarily wrong, but it does mean the entire dataset is not being used.

When you want every sample, keep drop_remainder=False, which is the default.

Generators Must Report Length Correctly

If you use a custom generator or Sequence, make sure its length matches the real number of batches.

python
1from tensorflow import keras
2import math
3
4class MySequence(keras.utils.Sequence):
5    def __init__(self, x, y, batch_size):
6        self.x = x
7        self.y = y
8        self.batch_size = batch_size
9
10    def __len__(self):
11        return math.ceil(len(self.x) / self.batch_size)
12
13    def __getitem__(self, idx):
14        start = idx * self.batch_size
15        end = start + self.batch_size
16        return self.x[start:end], self.y[start:end]

If __len__() underreports the number of batches, Keras never asks for the rest.

Validation Splits and Shuffling Can Confuse Diagnosis

Sometimes the model is training on the full training set, but part of the original data has been reserved for validation:

python
model.fit(x, y, validation_split=0.2, epochs=10)

Now only 80 percent of the data is used for training. That is expected behavior, but it can look like missing samples if you forgot the split exists.

Shuffling can also make it harder to notice which samples were seen, even though the real issue is elsewhere.

Common Pitfalls

  • Setting steps_per_epoch too small is one of the most common reasons not all batches are used.
  • Using .repeat() without understanding that the dataset becomes infinite makes epoch semantics easy to misread.
  • Enabling drop_remainder=True discards the final partial batch every epoch.
  • Returning the wrong value from __len__() in a custom sequence causes Keras to stop early.
  • Forgetting about validation_split can make it seem as though part of the dataset vanished from training.

Summary

  • Keras usually trains on the full dataset unless the input pipeline tells it otherwise.
  • Check steps_per_epoch first when using generators or tf.data.Dataset.
  • Be careful with .repeat() and with partial-batch dropping.
  • Custom generators must report their length accurately.
  • If training still looks incomplete, inspect the data pipeline rather than assuming model.fit() is skipping samples on its own.

Course illustration
Course illustration

All Rights Reserved.