getting aligned val_loss and train_loss plots for each epoch using WandB rather than separate plots
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
If train_loss and val_loss appear on separate WandB charts, the problem is usually not the chart itself. The issue is that the two metrics were logged on different steps, so WandB treats them as separate time series instead of two values on the same epoch axis.
Log both metrics against the same epoch
The most reliable fix is to compute one training loss value per epoch, compute one validation loss value per epoch, and log both in the same wandb.log() call.
Because the values are logged together, WandB stores them under the same history step. In the UI, you can add both metrics to one line chart and compare them directly.
Use a custom x-axis when epoch is what matters
WandB uses its internal step counter by default, and each log() call advances that counter. If your code logs training metrics per batch and validation metrics per epoch, the series drift apart. In that case, define epoch as the step metric for both losses.
This tells WandB to plot both metrics against the same epoch axis even if your run logs other batch-level metrics elsewhere.
If you must log in two separate calls
Sometimes the training loop and validation loop live in different functions. You can still align the plots, but you must keep the step value identical and prevent WandB from incrementing the step between calls.
The commit=False on the first call keeps both values grouped under the same history entry.
Common Pitfalls
The most common mistake is mixing batch-level train_loss with epoch-level val_loss. If one metric is logged hundreds of times per epoch and the other only once, the chart will not line up the way you expect.
Another issue is calling wandb.log() multiple times without a shared step or custom step metric. Each call advances WandB's internal step counter unless you control it explicitly.
Be careful with metric names too. Small naming differences such as val/loss versus val_loss create separate series and can make the dashboard look inconsistent.
Finally, if you are using a framework integration such as Keras, PyTorch Lightning, or TensorBoard sync, check whether that integration is already choosing a step metric for you. Manual logging and auto-logging can conflict if they track the same concept differently.
It is also worth verifying the chart settings in the WandB workspace. If the panel is still grouped by the default step instead of epoch, the logged data may be correct while the visualization remains misleading.
Summary
- Log
train_lossandval_losswith the same epoch value if you want them on one plot. - The simplest approach is one
wandb.log()call per epoch containing both metrics. - Use
run.define_metric(..., step_metric="epoch")when epoch should be the chart axis. - If separate log calls are unavoidable, keep the step identical and use
commit=Falseon the first call. - Most alignment problems come from inconsistent logging frequency, not from the WandB dashboard itself.

