keras
binary classification
sigmoid activation
machine learning
deep learning

Keras Binary Classification - Sigmoid activation function

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

For binary classification in Keras, the standard output design is one output unit with a sigmoid activation. Sigmoid maps a raw score into the 0 to 1 range, which makes it natural for yes-or-no tasks such as spam detection, churn prediction, or fraud classification.

Why Sigmoid Fits Binary Classification

Binary classification asks whether each example belongs to class 0 or class 1. A single scalar output is enough, and sigmoid turns that scalar into a probability-like score.

Example:

python
1import tensorflow as tf
2
3model = tf.keras.Sequential([
4    tf.keras.layers.Input(shape=(20,)),
5    tf.keras.layers.Dense(32, activation="relu"),
6    tf.keras.layers.Dense(16, activation="relu"),
7    tf.keras.layers.Dense(1, activation="sigmoid"),
8])
9
10model.compile(
11    optimizer="adam",
12    loss="binary_crossentropy",
13    metrics=["accuracy", tf.keras.metrics.AUC(name="auc")],
14)

In this setup, outputs near 0 suggest class 0, and outputs near 1 suggest class 1.

Keep Labels, Output, and Loss Consistent

Most sigmoid mistakes are not about sigmoid itself. They come from mismatched labels or loss configuration.

For a one-unit sigmoid output, labels should normally be binary scalars such as 0 and 1:

python
1import numpy as np
2
3x = np.random.randn(100, 20).astype("float32")
4y = np.random.randint(0, 2, size=(100, 1)).astype("float32")
5
6model.fit(x, y, epochs=3, batch_size=16, verbose=0)

If your targets are two-column one-hot vectors, that usually points toward a two-unit softmax output instead.

The main rule is that label shape, output shape, and loss semantics must agree.

Probabilities Versus Logits

There are two common valid setups:

  1. apply sigmoid in the model and use a loss that expects probabilities
  2. output raw logits and let the loss apply the sigmoid internally

Probability setup:

python
1prob_model = tf.keras.Sequential([
2    tf.keras.layers.Input(shape=(20,)),
3    tf.keras.layers.Dense(1, activation="sigmoid"),
4])
5
6prob_model.compile(
7    optimizer="adam",
8    loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
9)

Logit setup:

python
1logit_model = tf.keras.Sequential([
2    tf.keras.layers.Input(shape=(20,)),
3    tf.keras.layers.Dense(1),
4])
5
6logit_model.compile(
7    optimizer="adam",
8    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
9)

Both are correct. The important part is to choose one path and configure the loss consistently.

Prediction Threshold Is a Separate Decision

Training with sigmoid gives you scores between 0 and 1, but classification still requires a threshold. The default 0.5 threshold is common, but it is not automatically optimal.

python
scores = model.predict(x, verbose=0).ravel()
predictions = (scores >= 0.5).astype("int32")

In imbalanced or cost-sensitive problems, a different threshold may give better results. That threshold should usually be chosen from validation data rather than assumed forever.

So remember:

  • sigmoid gives a score
  • thresholding turns the score into a hard class decision

Those are related, but they are not the same step.

Imbalanced Data Needs Better Metrics

Sigmoid is still the right output for many imbalanced binary problems, but accuracy alone can be misleading.

Useful metrics include:

  • AUC
  • precision
  • recall
  • PR curves

You may also want class weighting:

python
class_weight = {0: 1.0, 1: 3.0}
model.fit(x, y, epochs=5, class_weight=class_weight, verbose=0)

The activation function does not change, but the way you train and evaluate the classifier often should.

A Practical Rule of Thumb

For ordinary binary classification in Keras:

  • one output unit
  • sigmoid activation
  • binary labels
  • binary cross-entropy loss

That combination is simple, standard, and usually correct.

Only move away from it when the problem shape changes, such as:

  • multi-class classification
  • multi-label classification
  • ranking problems
  • heavily custom output semantics

Common Pitfalls

The biggest mistake is using a one-unit sigmoid output with two-column one-hot labels. That creates a configuration mismatch.

Another issue is applying sigmoid in the model while also telling the loss to expect logits. That breaks the intended math.

People also often assume 0.5 is always the right decision threshold, which is not true for many real datasets.

Finally, evaluating a binary classifier only by accuracy can hide serious problems on imbalanced data.

Summary

  • A single sigmoid output is the standard Keras pattern for binary classification.
  • Labels, output shape, and loss configuration must agree.
  • You can train with probabilities or logits, but the from_logits setting must match the model output.
  • Sigmoid output still needs a classification threshold to produce hard class predictions.
  • For imbalanced datasets, evaluate more than accuracy alone.

Course illustration
Course illustration