Custom loss function with weights in Keras

keras

custom loss function

weighted loss

deep learning

machine learning

Custom loss function with weights in Keras

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

Weighting a loss in Keras is a common way to tell the model that some mistakes matter more than others. The first question, though, should be whether you really need a custom loss; many weighting problems are already handled by class_weight or sample_weight, and using the built-in path is usually simpler and less error-prone.

When a Custom Weighted Loss Is Necessary

Use Keras built-in weighting when the rule is straightforward:

'class_weight for class-level weighting in standard classification'
'sample_weight when each training example has its own weight'

Use a custom loss when the weighting logic depends on tensor values inside the batch, on multiple outputs, or on a weighting rule that the built-in arguments cannot express cleanly.

The implementation rule is important: a Keras loss must use TensorFlow operations. If you use NumPy inside the loss, gradient computation will break because NumPy executes outside the TensorFlow graph.

Example: Weighted Binary Crossentropy

The following example increases the penalty for false negatives by multiplying the positive class term.

python

1import numpy as np
2import tensorflow as tf
3
4positive_weight = 3.0
5
6@tf.keras.utils.register_keras_serializable()
7def weighted_binary_crossentropy(y_true, y_pred):
8    y_true = tf.cast(y_true, tf.float32)
9    y_pred = tf.clip_by_value(y_pred, 1e-7, 1.0 - 1e-7)
10
11    positive_term = -y_true * tf.math.log(y_pred) * positive_weight
12    negative_term = -(1.0 - y_true) * tf.math.log(1.0 - y_pred)
13    return tf.reduce_mean(positive_term + negative_term)
14
15X = np.random.randn(200, 3).astype("float32")
16y = (X[:, 0] + 0.3 * X[:, 1] > 0.5).astype("float32")
17
18model = tf.keras.Sequential(
19    [
20        tf.keras.layers.Input(shape=(3,)),
21        tf.keras.layers.Dense(8, activation="relu"),
22        tf.keras.layers.Dense(1, activation="sigmoid"),
23    ]
24)
25model.compile(optimizer="adam", loss=weighted_binary_crossentropy)
26model.fit(X, y, epochs=3, batch_size=16, verbose=0)

This example is fully differentiable because every operation is a TensorFlow op.

Per-Class Weighting for One-Hot Targets

For multi-class classification, a common pattern is to multiply the per-class crossentropy by a weight derived from the true class.

python

1import tensorflow as tf
2
3class_weights = tf.constant([1.0, 2.5, 4.0], dtype=tf.float32)
4
5def weighted_categorical_crossentropy(y_true, y_pred):
6    y_pred = tf.clip_by_value(y_pred, 1e-7, 1.0 - 1e-7)
7    per_class_loss = -y_true * tf.math.log(y_pred)
8    weighted = per_class_loss * class_weights
9    return tf.reduce_mean(tf.reduce_sum(weighted, axis=-1))

Here, the true class selects the relevant weight because y_true is one-hot encoded. If your labels are integer class ids instead, you would map those ids to weights differently.

Serialization Matters

If the model needs to be saved and loaded later, register the custom loss or pass it in custom_objects when loading. Forgetting that step is one of the most common reasons weighted-loss models fail to reload.

Using @tf.keras.utils.register_keras_serializable() is a good habit because it makes the loss easier to serialize in modern Keras workflows.

Prefer Simpler Built-In Weighting When It Fits

A lot of custom loss code is unnecessary. If your only goal is “class 1 should count three times more”, this may be enough:

python

model.compile(optimizer="adam", loss="binary_crossentropy")
model.fit(X, y, class_weight={0: 1.0, 1: 3.0}, epochs=3, verbose=0)

That version is shorter, easier to read, and avoids mistakes in the manual loss formula.

Common Pitfalls

The biggest pitfall is using NumPy inside the loss function. Loss code must stay in TensorFlow so gradients can flow correctly.

Another mistake is returning a Python scalar or reducing the wrong dimension. A Keras loss should return a tensor whose aggregation behavior is consistent with the model setup.

A third issue is writing a custom loss for a problem already solved by class_weight or sample_weight. That adds maintenance cost with no real gain.

Summary

Use class_weight or sample_weight first when they match the problem.
Write a custom loss only when the weighting rule depends on tensor logic that built-in weighting cannot express.
Keep the loss differentiable by using TensorFlow operations, not NumPy.
Register custom losses if the model needs to be saved and loaded later.
Validate the formula on a small example before trusting a long training run.