Custom weighted loss function in Keras for weighing each element

Keras

Custom Loss Functions

Weighted Loss

Deep Learning

Machine Learning

Custom weighted loss function in Keras for weighing each element

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

When people say they want a weighted loss in Keras, they often mean one of three different things: weighting whole samples, weighting classes, or weighting each output element individually. Keras already supports sample weights and class weights in many training flows, so a custom loss is mainly needed when the weight varies inside the tensor itself. The key implementation detail is that your loss must keep the per-element shape long enough for the weights to be applied correctly.

Sample Weights Are Not the Same as Element Weights

If each training example has one overall importance value, Keras can usually handle that with sample_weight. If you need one weight per class or per pixel or per sequence position, a custom loss is often the cleaner solution.

For example, in segmentation you may want border pixels to matter more than background pixels. In sequence models you may want certain time steps to count more heavily. That is not the same as giving the whole sample a single scalar weight.

A Custom Element-Wise Weighted Loss

The pattern is:

compute the unreduced per-element loss
multiply it by a weight tensor of the same shape
reduce to a scalar

Here is a binary cross-entropy example in TensorFlow Keras:

python

1import tensorflow as tf
2
3
4def weighted_binary_crossentropy(y_true, y_pred):
5    # Example weights: positives count 4x more than negatives.
6    weights = tf.where(tf.equal(y_true, 1.0), 4.0, 1.0)
7
8    element_loss = tf.keras.backend.binary_crossentropy(y_true, y_pred)
9    weighted_loss = element_loss * weights
10
11    return tf.reduce_mean(weighted_loss)
12
13
14model = tf.keras.Sequential([
15    tf.keras.layers.Input(shape=(4,)),
16    tf.keras.layers.Dense(8, activation="relu"),
17    tf.keras.layers.Dense(1, activation="sigmoid")
18])
19
20model.compile(optimizer="adam", loss=weighted_binary_crossentropy)

The important part is that binary_crossentropy returns element-wise values before the final reduction. If you reduce too early, the weights can no longer be applied per element.

Passing Explicit Weight Tensors

Sometimes the weights are not derivable from y_true alone. In that case, include them in the targets or build a custom training step.

A common trick is to pack both labels and weights into y_true:

python

1import tensorflow as tf
2
3
4def packed_weighted_mse(y_true_and_weight, y_pred):
5    y_true = y_true_and_weight[:, :1]
6    weights = y_true_and_weight[:, 1:2]
7
8    squared_error = tf.square(y_true - y_pred)
9    return tf.reduce_mean(squared_error * weights)

This approach works, but it should be documented clearly because the target tensor now carries more than labels.

Shape Compatibility Matters

Most weighting bugs are shape bugs. The weight tensor must broadcast the way you intend.

Examples:

sample-level weights often have shape (batch,)
element-wise weights may need shape (batch, features)
segmentation weights may need shape (batch, height, width, channels) or a compatible broadcast pattern

If the shapes broadcast incorrectly, the code may run without raising an error while still weighting the wrong elements.

Use tf.shape, print small batches, and test on toy data before starting a long training run.

When Built-In Weighting Is Better

Before writing a custom loss, ask whether built-in features already solve the problem:

'class_weight for class imbalance in supported classification setups'
'sample_weight for per-example weighting'
masking for padded sequence positions

A custom loss is best when the weighting logic truly belongs inside the loss tensor itself.

Common Pitfalls

Reducing the loss to a scalar before applying element weights.
Confusing class weights, sample weights, and element-wise weights.
Letting tensor broadcasting silently apply weights in the wrong shape.
Packing labels and weights together without documenting the target format.
Debugging on full training runs instead of validating the weighted loss on a tiny batch first.

Summary

Use a custom Keras loss when you need weights at the element level, not just per sample.
Compute the unreduced loss first, then multiply by weights, then reduce.
Make sure the weight tensor shape matches or broadcasts exactly as intended.
Prefer built-in sample_weight or class_weight when they already match the use case.
Test the loss on small tensors before trusting it in a full training job.