Keras Binary Classification - Sigmoid activation function
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
For binary classification in Keras, the standard output design is one output unit with a sigmoid activation. Sigmoid maps a raw score into the 0 to 1 range, which makes it natural for yes-or-no tasks such as spam detection, churn prediction, or fraud classification.
Why Sigmoid Fits Binary Classification
Binary classification asks whether each example belongs to class 0 or class 1. A single scalar output is enough, and sigmoid turns that scalar into a probability-like score.
Example:
In this setup, outputs near 0 suggest class 0, and outputs near 1 suggest class 1.
Keep Labels, Output, and Loss Consistent
Most sigmoid mistakes are not about sigmoid itself. They come from mismatched labels or loss configuration.
For a one-unit sigmoid output, labels should normally be binary scalars such as 0 and 1:
If your targets are two-column one-hot vectors, that usually points toward a two-unit softmax output instead.
The main rule is that label shape, output shape, and loss semantics must agree.
Probabilities Versus Logits
There are two common valid setups:
- apply sigmoid in the model and use a loss that expects probabilities
- output raw logits and let the loss apply the sigmoid internally
Probability setup:
Logit setup:
Both are correct. The important part is to choose one path and configure the loss consistently.
Prediction Threshold Is a Separate Decision
Training with sigmoid gives you scores between 0 and 1, but classification still requires a threshold. The default 0.5 threshold is common, but it is not automatically optimal.
In imbalanced or cost-sensitive problems, a different threshold may give better results. That threshold should usually be chosen from validation data rather than assumed forever.
So remember:
- sigmoid gives a score
- thresholding turns the score into a hard class decision
Those are related, but they are not the same step.
Imbalanced Data Needs Better Metrics
Sigmoid is still the right output for many imbalanced binary problems, but accuracy alone can be misleading.
Useful metrics include:
- AUC
- precision
- recall
- PR curves
You may also want class weighting:
The activation function does not change, but the way you train and evaluate the classifier often should.
A Practical Rule of Thumb
For ordinary binary classification in Keras:
- one output unit
- sigmoid activation
- binary labels
- binary cross-entropy loss
That combination is simple, standard, and usually correct.
Only move away from it when the problem shape changes, such as:
- multi-class classification
- multi-label classification
- ranking problems
- heavily custom output semantics
Common Pitfalls
The biggest mistake is using a one-unit sigmoid output with two-column one-hot labels. That creates a configuration mismatch.
Another issue is applying sigmoid in the model while also telling the loss to expect logits. That breaks the intended math.
People also often assume 0.5 is always the right decision threshold, which is not true for many real datasets.
Finally, evaluating a binary classifier only by accuracy can hide serious problems on imbalanced data.
Summary
- A single sigmoid output is the standard Keras pattern for binary classification.
- Labels, output shape, and loss configuration must agree.
- You can train with probabilities or logits, but the
from_logitssetting must match the model output. - Sigmoid output still needs a classification threshold to produce hard class predictions.
- For imbalanced datasets, evaluate more than accuracy alone.

