Bug in TensorFlow reduce_max for negative infinity?

TensorFlow

reduce_max

bug

negative infinity

machine learning

Bug in TensorFlow reduce_max for negative infinity?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Introduction

tf.reduce_max returns unexpected results when applied to tensors containing negative infinity (-inf) or to empty tensors. For an empty tensor, tf.reduce_max returns -inf (the identity element for max), which is mathematically correct but often surprises developers who expect an error. For tensors where all values are -inf, the result is -inf as expected, but when mixed with NaN values, NaN propagates and the max is NaN instead of -inf. Understanding these edge cases is critical for masked attention in transformers, log-probability computations, and any model that uses -inf for padding or masking.

The Expected Behavior

python

1import tensorflow as tf
2
3# Normal case — works correctly
4t = tf.constant([1.0, 3.0, 2.0])
5print(tf.reduce_max(t).numpy())  # 3.0
6
7# All negative values — works correctly
8t = tf.constant([-5.0, -1.0, -3.0])
9print(tf.reduce_max(t).numpy())  # -1.0
10
11# With negative infinity — works correctly
12t = tf.constant([-float('inf'), 1.0, -float('inf')])
13print(tf.reduce_max(t).numpy())  # 1.0

The Surprising Edge Cases

python

1import tensorflow as tf
2import numpy as np
3
4# Edge case 1: ALL values are -inf
5t = tf.constant([-float('inf'), -float('inf'), -float('inf')])
6print(tf.reduce_max(t).numpy())  # -inf (correct, but can cause issues downstream)
7
8# Edge case 2: Empty tensor
9t = tf.constant([], dtype=tf.float32)
10print(tf.reduce_max(t).numpy())  # -inf (identity element for max)
11
12# Edge case 3: NaN mixed with -inf
13t = tf.constant([float('nan'), -float('inf'), 1.0])
14print(tf.reduce_max(t).numpy())  # nan (NaN propagates!)
15
16# Edge case 4: reduce_max along axis with all -inf rows
17t = tf.constant([[-float('inf'), -float('inf')],
18                 [1.0, 2.0]])
19print(tf.reduce_max(t, axis=1).numpy())  # [-inf, 2.0]

The primary "bug" is actually correct IEEE 754 behavior — NaN propagates through comparisons and the max of an empty set is conventionally -inf. But these results can cause numerical instability in softmax, log-sum-exp, and attention mechanisms.

Why This Matters: Softmax with Masking

python

1import tensorflow as tf
2
3# Transformer attention masking uses -inf
4logits = tf.constant([[1.0, 2.0, 3.0, -1e9, -1e9]])  # Masked positions
5
6# Softmax subtracts max for numerical stability
7max_logits = tf.reduce_max(logits, axis=-1, keepdims=True)
8stable_logits = logits - max_logits
9softmax = tf.nn.softmax(stable_logits)
10print(softmax.numpy())  # [[0.09, 0.24, 0.67, 0.0, 0.0]] — correct
11
12# But if ALL positions are masked:
13all_masked = tf.constant([[-1e9, -1e9, -1e9]])
14max_val = tf.reduce_max(all_masked, axis=-1, keepdims=True)
15# max_val = -1e9, stable = [0, 0, 0], softmax = [0.33, 0.33, 0.33]
16# This gives UNIFORM attention over masked tokens — wrong!
17
18# With actual -inf:
19all_inf = tf.constant([[-float('inf'), -float('inf'), -float('inf')]])
20max_val = tf.reduce_max(all_inf, axis=-1, keepdims=True)  # -inf
21stable = all_inf - max_val  # -inf - (-inf) = nan!
22print(tf.nn.softmax(stable).numpy())  # [[nan, nan, nan]]

Workaround 1: Replace -inf Before reduce_max

python

1import tensorflow as tf
2
3def safe_reduce_max(tensor, axis=None, keepdims=False):
4    """Replace -inf with a large negative number before reduce_max."""
5    safe_tensor = tf.where(
6        tf.math.is_inf(tensor) & (tensor < 0),
7        tf.constant(-1e9, dtype=tensor.dtype),
8        tensor
9    )
10    return tf.reduce_max(safe_tensor, axis=axis, keepdims=keepdims)
11
12t = tf.constant([-float('inf'), -float('inf'), -float('inf')])
13print(safe_reduce_max(t).numpy())  # -1e9 (no -inf propagation)

Workaround 2: Check for All-Masked Rows

python

1import tensorflow as tf
2
3def masked_softmax(logits, mask):
4    """Softmax that handles fully masked rows gracefully."""
5    # mask: True for valid positions, False for masked
6    logits = tf.where(mask, logits, tf.constant(-1e9, dtype=logits.dtype))
7
8    # Check for rows where ALL positions are masked
9    any_valid = tf.reduce_any(mask, axis=-1, keepdims=True)
10
11    softmax = tf.nn.softmax(logits, axis=-1)
12
13    # Zero out fully masked rows instead of returning uniform distribution
14    softmax = tf.where(any_valid, softmax, tf.zeros_like(softmax))
15    return softmax
16
17logits = tf.constant([[1.0, 2.0, 3.0]])
18mask = tf.constant([[True, True, False]])
19print(masked_softmax(logits, mask).numpy())
20# [[0.27, 0.73, 0.0]]

Workaround 3: Use tf.math.reduce_max with Initial Value

python

1import tensorflow as tf
2
3# TensorFlow 2.x doesn't have an initial parameter for reduce_max
4# But you can clamp the result
5def clamped_reduce_max(tensor, axis=None, min_value=-1e9):
6    result = tf.reduce_max(tensor, axis=axis)
7    return tf.maximum(result, min_value)
8
9t = tf.constant([-float('inf'), -float('inf')])
10print(clamped_reduce_max(t).numpy())  # -1e9
11
12# NumPy comparison: np.max has an initial parameter
13import numpy as np
14print(np.max([-np.inf, -np.inf]))          # -inf
15print(np.max([], initial=-1e9))             # -1e9

Log-Sum-Exp Stability

python

1import tensorflow as tf
2
3def safe_logsumexp(logits, axis=None):
4    """Numerically stable log-sum-exp that handles -inf."""
5    max_val = tf.reduce_max(logits, axis=axis, keepdims=True)
6    # Replace -inf max with 0 to avoid nan in subtraction
7    max_val = tf.where(tf.math.is_inf(max_val), tf.zeros_like(max_val), max_val)
8    return tf.squeeze(max_val, axis=axis) + tf.math.log(
9        tf.reduce_sum(tf.exp(logits - max_val), axis=axis)
10    )
11
12# Standard tf.math.reduce_logsumexp handles this correctly in TF 2.x
13logits = tf.constant([-float('inf'), 1.0, 2.0])
14print(tf.math.reduce_logsumexp(logits).numpy())  # 2.31 (correct)

Common Pitfalls

Subtracting -inf from -inf: The expression -inf - (-inf) produces NaN, not 0. This breaks the standard numerical stability trick of softmax(x) = softmax(x - max(x)) when all elements are -inf. Always check for all-masked rows before applying the max subtraction.
Using -inf instead of a large negative number for masking: While -inf is mathematically correct for masking, using -1e9 avoids NaN propagation in edge cases. Most models use -1e9 or float('-inf') with explicit guards.
Empty tensor reduce_max returning -inf: tf.reduce_max(tf.constant([])) returns -inf, not an error. If you expect the tensor to always have elements, add an assertion: tf.debugging.assert_greater(tf.size(tensor), 0).
NaN silently propagating through max: tf.reduce_max([nan, 1.0, 2.0]) returns NaN, not 2.0. NaN poisons all comparisons. Use tf.where(tf.math.is_nan(tensor), -inf, tensor) to replace NaN before computing max.
GPU vs CPU behavior differences: Some GPU kernels handle -inf and NaN differently than CPU. A model that works on CPU may produce different results on GPU for tensors containing special float values. Always test edge cases on the target hardware.

Summary

tf.reduce_max returns -inf for empty tensors and all--inf tensors (correct IEEE 754 behavior)
-inf - (-inf) produces NaN, breaking softmax numerical stability tricks
NaN propagates through reduce_max — always sanitize inputs first
Use -1e9 instead of -inf for masking to avoid NaN in edge cases
For masked attention, check for fully masked rows and zero out their softmax output
tf.math.reduce_logsumexp handles -inf correctly in TensorFlow 2.x