Keras
custom loss function
false negatives
deep learning
neural networks

Custom loss function in Keras to penalize false negatives

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In many classification problems, false negatives are more expensive than false positives. Examples include fraud detection, medical screening, and safety monitoring. Standard binary cross-entropy treats both error types symmetrically unless class imbalance naturally skews learning. To penalize false negatives more, you can use weighted losses or asymmetric formulations.

The safest first option in Keras is weighted binary cross-entropy, where positive class errors are amplified. This directly increases cost of predicting low probabilities for true positive samples.

Core Sections

1. Weighted binary cross-entropy

python
1import tensorflow as tf
2
3class WeightedBCELoss(tf.keras.losses.Loss):
4    def __init__(self, pos_weight=3.0, name="weighted_bce"):
5        super().__init__(name=name)
6        self.pos_weight = pos_weight
7
8    def call(self, y_true, y_pred):
9        y_true = tf.cast(y_true, tf.float32)
10        y_pred = tf.clip_by_value(y_pred, 1e-7, 1.0 - 1e-7)
11
12        loss_pos = -self.pos_weight * y_true * tf.math.log(y_pred)
13        loss_neg = -(1.0 - y_true) * tf.math.log(1.0 - y_pred)
14        return tf.reduce_mean(loss_pos + loss_neg)

Compile model:

python
model.compile(optimizer="adam", loss=WeightedBCELoss(pos_weight=4.0), metrics=["accuracy"])

2. Use built-in weighted logits loss

If model outputs logits (no final sigmoid), use numerically stable TensorFlow op:

python
1def weighted_bce_from_logits(y_true, logits):
2    y_true = tf.cast(y_true, tf.float32)
3    return tf.reduce_mean(tf.nn.weighted_cross_entropy_with_logits(
4        labels=y_true,
5        logits=logits,
6        pos_weight=4.0
7    ))

3. Tune decision threshold separately

Loss weighting changes training gradients, but inference threshold still matters.

python
proba = model.predict(X_val).ravel()
pred = (proba >= 0.3).astype(int)  # lower threshold to reduce FN

Evaluate recall/precision tradeoff for your domain cost function.

4. Monitor task-relevant metrics

Track recall, sensitivity, PR-AUC, and confusion matrix, not only accuracy.

5. Calibrate pos_weight

Start from class imbalance ratio, then adjust based on validation cost outcomes.

Common Pitfalls

  • Increasing positive class weight without validating precision collapse.
  • Applying weighted loss and still using default 0.5 threshold blindly.
  • Mixing sigmoid output with logits-based loss formulas incorrectly.
  • Using accuracy as primary metric in highly imbalanced settings.
  • Setting extremely high weights and destabilizing optimization.

Summary

To penalize false negatives in Keras, use asymmetric loss weighting, typically weighted binary cross-entropy, and pair it with threshold tuning. Ensure numerical stability, choose metrics aligned to domain risk, and calibrate positive weight empirically. This approach improves sensitivity where missed positives are costly while keeping model behavior measurable and controllable.

A practical way to make this guidance durable is to convert it into a small runbook that includes prerequisites, expected environment versions, and a short verification sequence. Even strong teams lose time when troubleshooting steps live only in memory or chat history. A runbook should explicitly answer three questions: what to check first, what output confirms healthy behavior, and what output indicates a known failure mode. This level of clarity helps both experienced maintainers and newer contributors, and it reduces repeated investigation during incidents.

It is also valuable to create a tiny reproducible fixture for this topic. The fixture can be a minimal script, test case, sample request, or small dataset that demonstrates the correct behavior in isolation. When regressions appear after dependency upgrades, infrastructure changes, or framework migrations, that fixture becomes the fastest way to isolate whether the issue is environmental or logic-related. Keeping a focused fixture in source control gives you a stable benchmark across branches and release cycles.

For long-term reliability, pair documentation with one automated guardrail in CI. The guardrail should be narrow and fast: an import check, schema validation, endpoint contract test, deterministic unit test, or lightweight performance threshold. Avoid broad flaky checks that hide real signals. The goal is early, actionable feedback before code reaches production. If the same category of issue appears repeatedly, promote the manual troubleshooting step into automation so the system catches it first. Over time, this shifts effort from reactive debugging to preventive quality control and keeps the knowledge article relevant in real engineering workflows.


Course illustration
Course illustration

All Rights Reserved.