What is the relation between validation_data and validation_split in Keras' fit function?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Both validation_data and validation_split tell Keras to evaluate the model on held-out data during training, but they do it in different ways. validation_split slices a fraction out of the training arrays you pass to fit, while validation_data gives Keras an explicit separate dataset, and the explicit dataset takes priority when both are supplied.
What validation_split Does
validation_split is a convenience option for in-memory array data. If you write:
Keras reserves the last 20 percent of the provided training arrays for validation and trains on the remaining 80 percent.
That means validation_split only works cleanly when:
- the input data is indexable, such as NumPy arrays or tensors
- training and labels are aligned row by row
- you are comfortable letting Keras create the held-out subset for you
It is quick, but it is also less explicit.
What validation_data Does
validation_data gives Keras the validation set directly.
This is the more flexible option because your validation set can come from a separate preprocessing step, a manual split, or even a dataset object.
It is the right choice when:
- you already created a train and validation split yourself
- you need reproducible control over the exact validation rows
- your validation set is generated differently from training data
- you are using dataset pipelines rather than simple arrays
The Relation Between Them
Conceptually, both options feed validation metrics such as val_loss and val_accuracy at the end of each epoch. The difference is only in how the validation data is sourced.
A practical way to think about them is:
- '
validation_splitmeans "please carve validation rows out of the training arrays for me"' - '
validation_datameans "use this validation dataset that I prepared myself"'
If both are passed, the explicit validation_data wins and the split argument is effectively ignored.
Example With A Manual Split
A manual split is often clearer, especially when you want stratification or custom preprocessing.
This approach makes the split explicit and reproducible.
When validation_split Is Fine
For small experiments with arrays already in memory, validation_split is perfectly reasonable.
It is especially handy for quick prototypes, but it should not be confused with a carefully designed validation strategy.
Important Behavior Details
A few details matter in practice:
- '
validation_splitapplies only to the arrays passed tofit' - it is not designed for generators or arbitrary streaming inputs
- it uses a deterministic slice of the provided arrays rather than a complex stratified sampling step
- '
validation_datais usually clearer when data order matters or preprocessing differs'
That last point matters for time series and grouped data, where a naive slice may produce misleading validation results.
Common Pitfalls
The most common mistake is assuming validation_split performs a sophisticated train-validation split. It does not. It is just a convenience slice on the arrays you supplied.
Another mistake is passing both validation_split and validation_data and expecting Keras to combine them. It will not. The explicit validation dataset takes precedence.
A third issue is using validation_split on ordered data such as time series without thinking about leakage or sampling bias.
Summary
- Both options provide validation metrics during
fit. - '
validation_splitcarves validation rows out of the training arrays automatically.' - '
validation_datauses an explicit separate validation dataset.' - If both are supplied,
validation_datatakes priority. - Use
validation_splitfor quick experiments andvalidation_datawhen you need full control.

