Custom weighted loss function in Keras for weighing each element
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
When people say they want a weighted loss in Keras, they often mean one of three different things: weighting whole samples, weighting classes, or weighting each output element individually. Keras already supports sample weights and class weights in many training flows, so a custom loss is mainly needed when the weight varies inside the tensor itself. The key implementation detail is that your loss must keep the per-element shape long enough for the weights to be applied correctly.
Sample Weights Are Not the Same as Element Weights
If each training example has one overall importance value, Keras can usually handle that with sample_weight. If you need one weight per class or per pixel or per sequence position, a custom loss is often the cleaner solution.
For example, in segmentation you may want border pixels to matter more than background pixels. In sequence models you may want certain time steps to count more heavily. That is not the same as giving the whole sample a single scalar weight.
A Custom Element-Wise Weighted Loss
The pattern is:
- compute the unreduced per-element loss
- multiply it by a weight tensor of the same shape
- reduce to a scalar
Here is a binary cross-entropy example in TensorFlow Keras:
The important part is that binary_crossentropy returns element-wise values before the final reduction. If you reduce too early, the weights can no longer be applied per element.
Passing Explicit Weight Tensors
Sometimes the weights are not derivable from y_true alone. In that case, include them in the targets or build a custom training step.
A common trick is to pack both labels and weights into y_true:
This approach works, but it should be documented clearly because the target tensor now carries more than labels.
Shape Compatibility Matters
Most weighting bugs are shape bugs. The weight tensor must broadcast the way you intend.
Examples:
- sample-level weights often have shape
(batch,) - element-wise weights may need shape
(batch, features) - segmentation weights may need shape
(batch, height, width, channels)or a compatible broadcast pattern
If the shapes broadcast incorrectly, the code may run without raising an error while still weighting the wrong elements.
Use tf.shape, print small batches, and test on toy data before starting a long training run.
When Built-In Weighting Is Better
Before writing a custom loss, ask whether built-in features already solve the problem:
- '
class_weightfor class imbalance in supported classification setups' - '
sample_weightfor per-example weighting' - masking for padded sequence positions
A custom loss is best when the weighting logic truly belongs inside the loss tensor itself.
Common Pitfalls
- Reducing the loss to a scalar before applying element weights.
- Confusing class weights, sample weights, and element-wise weights.
- Letting tensor broadcasting silently apply weights in the wrong shape.
- Packing labels and weights together without documenting the target format.
- Debugging on full training runs instead of validating the weighted loss on a tiny batch first.
Summary
- Use a custom Keras loss when you need weights at the element level, not just per sample.
- Compute the unreduced loss first, then multiply by weights, then reduce.
- Make sure the weight tensor shape matches or broadcasts exactly as intended.
- Prefer built-in
sample_weightorclass_weightwhen they already match the use case. - Test the loss on small tensors before trusting it in a full training job.

