Keras EarlyStopping Which min_delta and patience to use?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
There is no universal magic pair of min_delta and patience values for Keras EarlyStopping. The right settings depend on how noisy your validation metric is, how quickly your model learns, and how expensive each epoch is. A good choice is therefore not a fixed recipe but a calibration problem: decide what counts as meaningful improvement and how many flat epochs you are willing to tolerate before stopping.
What min_delta and patience Actually Mean
min_delta is the smallest improvement that Keras should treat as real progress. If the monitored metric changes by less than that amount, Keras counts the epoch as "no improvement."
patience is how many such non-improving epochs Keras will allow before it stops training.
Typical usage:
This means:
- monitor validation loss
- require at least
0.001improvement to count as progress - stop after 5 consecutive epochs without that level of improvement
The important point is that both settings are measured relative to the scale and noise of the metric you are monitoring.
Choose min_delta Based on Metric Noise
If your validation loss naturally wiggles by tiny amounts each epoch, setting min_delta=0 can make training continue for a long time because every microscopic improvement resets patience.
On the other hand, if min_delta is too large, Keras may ignore real but modest progress and stop too early.
A useful rule is:
- for
val_loss, start with a small value such as1e-4or1e-3 - for accuracy-like metrics, use a threshold that matches meaningful percentage change
For example, if validation accuracy jumps around by about 0.002 from noise, setting min_delta=0.00001 is too sensitive. A value around 0.001 or 0.002 may be more reasonable.
The easiest way to choose is to look at a training curve from an unrestricted run and ask: what variation looks like noise, and what variation looks like genuine learning?
Choose patience Based on Learning Shape
patience should reflect how often your model improves after short plateaus.
If training is smooth and fast, low patience may be enough:
- '
patience=3to5for quick, stable experiments'
If training is noisy or improvements arrive in bursts, use more patience:
- '
patience=8to15for noisier validation curves'
For example:
This is a reasonable starting point when accuracy improves irregularly and you do not want to stop after just a few flat epochs.
Patience also depends on epoch cost. If each epoch is cheap, a slightly larger patience is often fine. If each epoch is expensive, you may prefer a more aggressive stop rule.
Always Pair It with restore_best_weights
In many real training runs, the best model appears a few epochs before the stop condition triggers. That is why restore_best_weights=True is usually the safer default.
Without this setting, training stops later but leaves you with the final epoch weights rather than the best observed ones.
That can make early stopping look worse than it really is.
A Practical Starting Heuristic
If you want a reasonable first try:
- monitor
val_lossfor most training jobs - start
min_deltaaround1e-3for loss - start
patiencearound5to10 - enable
restore_best_weights
Then inspect the learning curves and adjust:
- if it stops too early, increase patience or lower
min_delta - if it runs forever on tiny metric noise, increase
min_delta
This iterative tuning is normal. Early stopping is part of training strategy, not a constant copied blindly from tutorials.
Common Pitfalls
The biggest mistake is treating min_delta=0 as always safe. On noisy validation curves, that can keep training alive on meaningless improvements.
Another issue is using a min_delta that is too large relative to the monitored metric. That can make Keras ignore real progress and stop prematurely.
Developers also often forget to set the correct mode. If you monitor val_accuracy, you usually want mode="max". If you monitor val_loss, mode="min" is the natural direction.
Finally, do not pick patience without looking at actual curves. A patience of 3 might be perfect for one problem and terrible for another.
Summary
- '
min_deltadefines what improvement is meaningful;patiencedefines how long to wait for it.' - Choose
min_deltabased on the noise scale of the monitored metric. - Choose
patiencebased on how bursty or smooth training improvements are. - '
restore_best_weights=Trueis usually the safer default.' - Start with sensible defaults, inspect the learning curves, and tune from actual training behavior.

