Keras
EarlyStopping
min_delta
patience
machine learning

Keras EarlyStopping Which min_delta and patience to use?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

There is no universal magic pair of min_delta and patience values for Keras EarlyStopping. The right settings depend on how noisy your validation metric is, how quickly your model learns, and how expensive each epoch is. A good choice is therefore not a fixed recipe but a calibration problem: decide what counts as meaningful improvement and how many flat epochs you are willing to tolerate before stopping.

What min_delta and patience Actually Mean

min_delta is the smallest improvement that Keras should treat as real progress. If the monitored metric changes by less than that amount, Keras counts the epoch as "no improvement."

patience is how many such non-improving epochs Keras will allow before it stops training.

Typical usage:

python
1import tensorflow as tf
2
3callback = tf.keras.callbacks.EarlyStopping(
4    monitor="val_loss",
5    min_delta=0.001,
6    patience=5,
7    restore_best_weights=True
8)

This means:

  • monitor validation loss
  • require at least 0.001 improvement to count as progress
  • stop after 5 consecutive epochs without that level of improvement

The important point is that both settings are measured relative to the scale and noise of the metric you are monitoring.

Choose min_delta Based on Metric Noise

If your validation loss naturally wiggles by tiny amounts each epoch, setting min_delta=0 can make training continue for a long time because every microscopic improvement resets patience.

On the other hand, if min_delta is too large, Keras may ignore real but modest progress and stop too early.

A useful rule is:

  • for val_loss, start with a small value such as 1e-4 or 1e-3
  • for accuracy-like metrics, use a threshold that matches meaningful percentage change

For example, if validation accuracy jumps around by about 0.002 from noise, setting min_delta=0.00001 is too sensitive. A value around 0.001 or 0.002 may be more reasonable.

The easiest way to choose is to look at a training curve from an unrestricted run and ask: what variation looks like noise, and what variation looks like genuine learning?

Choose patience Based on Learning Shape

patience should reflect how often your model improves after short plateaus.

If training is smooth and fast, low patience may be enough:

  • 'patience=3 to 5 for quick, stable experiments'

If training is noisy or improvements arrive in bursts, use more patience:

  • 'patience=8 to 15 for noisier validation curves'

For example:

python
1callback = tf.keras.callbacks.EarlyStopping(
2    monitor="val_accuracy",
3    min_delta=0.002,
4    patience=10,
5    mode="max",
6    restore_best_weights=True
7)

This is a reasonable starting point when accuracy improves irregularly and you do not want to stop after just a few flat epochs.

Patience also depends on epoch cost. If each epoch is cheap, a slightly larger patience is often fine. If each epoch is expensive, you may prefer a more aggressive stop rule.

Always Pair It with restore_best_weights

In many real training runs, the best model appears a few epochs before the stop condition triggers. That is why restore_best_weights=True is usually the safer default.

python
1callback = tf.keras.callbacks.EarlyStopping(
2    monitor="val_loss",
3    min_delta=0.001,
4    patience=7,
5    restore_best_weights=True
6)

Without this setting, training stops later but leaves you with the final epoch weights rather than the best observed ones.

That can make early stopping look worse than it really is.

A Practical Starting Heuristic

If you want a reasonable first try:

  • monitor val_loss for most training jobs
  • start min_delta around 1e-3 for loss
  • start patience around 5 to 10
  • enable restore_best_weights

Then inspect the learning curves and adjust:

  • if it stops too early, increase patience or lower min_delta
  • if it runs forever on tiny metric noise, increase min_delta

This iterative tuning is normal. Early stopping is part of training strategy, not a constant copied blindly from tutorials.

Common Pitfalls

The biggest mistake is treating min_delta=0 as always safe. On noisy validation curves, that can keep training alive on meaningless improvements.

Another issue is using a min_delta that is too large relative to the monitored metric. That can make Keras ignore real progress and stop prematurely.

Developers also often forget to set the correct mode. If you monitor val_accuracy, you usually want mode="max". If you monitor val_loss, mode="min" is the natural direction.

Finally, do not pick patience without looking at actual curves. A patience of 3 might be perfect for one problem and terrible for another.

Summary

  • 'min_delta defines what improvement is meaningful; patience defines how long to wait for it.'
  • Choose min_delta based on the noise scale of the monitored metric.
  • Choose patience based on how bursty or smooth training improvements are.
  • 'restore_best_weights=True is usually the safer default.'
  • Start with sensible defaults, inspect the learning curves, and tune from actual training behavior.

Course illustration
Course illustration

All Rights Reserved.