Keras
custom loss function
YOLO
deep learning
computer vision

Keras custom loss function for YOLO

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

YOLO models do not use a single off-the-shelf loss such as plain cross-entropy. They combine localization, objectness, and classification terms, so in Keras you often need a custom loss to match the exact tensor layout and weighting used by your version of YOLO.

Why YOLO Needs a Custom Loss

A detector predicts more than a class label. For each anchor or grid cell, the model usually outputs:

  • box coordinates
  • an objectness score
  • class probabilities

Those pieces should not be penalized equally. Bounding box regression often needs higher weight than background confidence, otherwise the model learns to predict "no object" everywhere.

In older YOLO-style formulations, the loss is conceptually split into three parts:

  • coordinate loss for x, y, w, and h
  • objectness loss for presence or absence of an object
  • classification loss for the class distribution

In modern TensorFlow code, the exact box parameterization may differ, but the principle is the same: slice the prediction tensor into meaningful parts and compute each term separately.

A Simple Keras Implementation

The following example uses a simplified output shape where each prediction row is:

[x, y, w, h, objectness, class_0, class_1, ...]

This is not a full production YOLO implementation, but it is runnable and shows the structure clearly.

python
1import tensorflow as tf
2
3
4class SimpleYoloLoss(tf.keras.losses.Loss):
5    def __init__(self, lambda_box=5.0, lambda_noobj=0.5, name="simple_yolo_loss"):
6        super().__init__(name=name)
7        self.lambda_box = lambda_box
8        self.lambda_noobj = lambda_noobj
9
10    def call(self, y_true, y_pred):
11        true_box = y_true[..., 0:4]
12        pred_box = y_pred[..., 0:4]
13
14        true_obj = y_true[..., 4:5]
15        pred_obj = y_pred[..., 4:5]
16
17        true_cls = y_true[..., 5:]
18        pred_cls = y_pred[..., 5:]
19
20        box_loss = tf.reduce_sum(tf.square(true_box - pred_box) * true_obj)
21
22        obj_loss = tf.reduce_sum(tf.square(true_obj - pred_obj) * true_obj)
23        noobj_loss = tf.reduce_sum(tf.square(true_obj - pred_obj) * (1.0 - true_obj))
24
25        cls_loss = tf.reduce_sum(
26            tf.keras.losses.categorical_crossentropy(true_cls, pred_cls, from_logits=False)
27            * tf.squeeze(true_obj, axis=-1)
28        )
29
30        return self.lambda_box * box_loss + obj_loss + self.lambda_noobj * noobj_loss + cls_loss

The mask true_obj ensures that box and class loss are only applied where an object exists. That detail matters a lot. Without masking, background cells dominate training and the detector becomes unstable or useless.

Compiling and Testing the Loss

Here is a tiny runnable example:

python
1import tensorflow as tf
2
3loss_fn = SimpleYoloLoss()
4
5y_true = tf.constant([
6    [
7        [0.5, 0.5, 0.2, 0.2, 1.0, 1.0, 0.0],
8        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0],
9    ]
10], dtype=tf.float32)
11
12y_pred = tf.constant([
13    [
14        [0.45, 0.55, 0.25, 0.18, 0.9, 0.8, 0.2],
15        [0.1, 0.1, 0.1, 0.1, 0.3, 0.4, 0.6],
16    ]
17], dtype=tf.float32)
18
19print(float(loss_fn(y_true, y_pred)))

You can use the same loss in a model:

python
1model = tf.keras.Sequential([
2    tf.keras.layers.Input(shape=(32,)),
3    tf.keras.layers.Dense(64, activation="relu"),
4    tf.keras.layers.Dense(14, activation="sigmoid"),
5    tf.keras.layers.Reshape((2, 7)),
6])
7
8model.compile(optimizer="adam", loss=SimpleYoloLoss())

This toy model is not a good detector, but it proves the loss is wired correctly and can participate in backpropagation.

Design Choices That Matter

A custom YOLO loss usually depends on a few high-impact decisions.

First, decide whether the network emits raw logits or activated probabilities. If class outputs are logits, use from_logits=True or apply softmax yourself. Mixing those approaches produces misleading gradients.

Second, be precise about coordinate encoding. Some models predict normalized center coordinates and log-space widths. Others decode anchors first and compute loss in image-relative coordinates. The loss must match the model head exactly.

Third, weight the terms deliberately. A common tuning pattern is to make localization expensive, background confidence cheap, and classification conditional on object presence.

Finally, keep the first version small. It is much easier to debug a simplified loss that works on one anchor format than a giant function that tries to reproduce every feature from a research paper at once.

Common Pitfalls

The biggest mistake is misaligned tensor shapes. If y_true and y_pred do not slice the same way, the code may still run while learning nonsense.

Another common issue is applying class loss to background cells. Classification should usually be masked by object presence, otherwise the model is punished for not classifying empty space.

Numerical instability can also appear when width and height are transformed carelessly. If you use square roots or logarithms, clamp values to valid ranges before computing the loss.

Developers also mix up logits and probabilities. If the last layer uses sigmoid or softmax, the loss should expect probabilities. If it does not, the loss should be configured for logits.

Summary

  • YOLO needs a custom loss because it predicts boxes, objectness, and classes together.
  • A practical Keras loss slices the output tensor and computes separate terms.
  • Masking by object presence is essential for box and class loss.
  • The loss must match the model's output encoding exactly.
  • Start with a small, testable implementation before adding anchors, IoU terms, or advanced weighting.

Course illustration
Course illustration

All Rights Reserved.