TensorFlow
logits
machine learning
neural networks
deep learning

What is the meaning of the word logits in TensorFlow?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

In TensorFlow, logits are the raw output scores produced by a model before applying a final normalization such as sigmoid or softmax. They are not probabilities yet, but they contain the information needed to compute probabilities and losses in a numerically stable way.

Logits Are Raw Scores

Consider a classifier that predicts one of three classes. The last dense layer may output values such as 2.1, -0.3, and 0.7. Those numbers are logits. They can be positive, negative, and do not need to add up to 1.

Here is a simple TensorFlow example:

python
1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Input(shape=(4,)),
5    tf.keras.layers.Dense(3)
6])
7
8x = tf.constant([[1.0, 2.0, 3.0, 4.0]])
9logits = model(x)
10print(logits)

The final layer has no activation, so the output is a tensor of logits.

Convert Logits to Probabilities

To interpret logits as probabilities, apply an activation function.

For multi-class classification, use softmax:

python
1import tensorflow as tf
2
3logits = tf.constant([[2.1, -0.3, 0.7]])
4probs = tf.nn.softmax(logits)
5
6print(probs)
7print(tf.reduce_sum(probs))

The resulting probabilities are all between 0 and 1, and they sum to 1 across the class dimension.

For binary classification, the equivalent transformation is sigmoid:

python
1import tensorflow as tf
2
3logit = tf.constant([1.7])
4prob = tf.nn.sigmoid(logit)
5
6print(prob)

So the mental model is simple: logits are raw scores, while sigmoid or softmax turns those scores into probabilities.

Why TensorFlow Losses Often Want Logits

TensorFlow frequently asks whether your labels are being compared against logits or already-normalized probabilities. This is why many loss functions expose a from_logits argument.

For example:

python
1import tensorflow as tf
2
3loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
4
5y_true = tf.constant([[0.0, 0.0, 1.0]])
6y_pred = tf.constant([[2.1, -0.3, 0.7]])
7
8loss = loss_fn(y_true, y_pred)
9print(loss)

With from_logits=True, the loss function knows it must apply the right stable transformation internally. This is preferred to manually running softmax first and then handing probabilities to a loss that expects logits.

The same pattern exists for binary classification:

python
1import tensorflow as tf
2
3loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)
4
5y_true = tf.constant([[1.0]])
6y_pred = tf.constant([[1.7]])
7
8print(loss_fn(y_true, y_pred))

Model Design Implications

Whether your final layer should emit logits or probabilities depends on how you train and serve the model. A common training pattern is:

  • final dense layer without activation
  • loss configured with from_logits=True

For inference, you then apply sigmoid or softmax only when you need human-readable probabilities.

Example training model:

python
1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Input(shape=(4,)),
5    tf.keras.layers.Dense(16, activation="relu"),
6    tf.keras.layers.Dense(3)
7])
8
9model.compile(
10    optimizer="adam",
11    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True)
12)

This setup avoids a common source of numerical instability and keeps the meaning of the final layer explicit.

Common Pitfalls

  • Applying softmax in the model and also using a loss with from_logits=True effectively transforms the output twice and gives incorrect training behavior.
  • Treating logits as probabilities leads to wrong interpretation because logits do not sum to 1 and can be negative.
  • Forgetting that binary and multi-class problems use different output conventions can cause shape and loss mismatches.
  • Comparing logits from different models as if they were calibrated probabilities is misleading. Raw scores are model-dependent.
  • Adding an output activation without updating the loss configuration is a common training bug. The final layer and loss settings must agree.

Summary

  • Logits are raw model scores before sigmoid or softmax.
  • Use softmax for multi-class probabilities and sigmoid for binary probabilities.
  • Many TensorFlow losses prefer logits because they handle the conversion internally and more stably.
  • Match your final layer design to the loss configuration, especially the from_logits setting.
  • Read logits as internal scores, not as final probabilities.

Course illustration
Course illustration

All Rights Reserved.