Keras
Multiclass Classification
Machine Learning
Neural Networks
Deep Learning

How to do Multiclass classification with Keras?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Multiclass classification means predicting one label from three or more mutually exclusive classes. In Keras, the core setup is simple once three pieces line up correctly: the label encoding, the output layer, and the loss function. Most training problems come from mismatching one of those pieces rather than from the network architecture itself.

Match Labels, Output Layer, and Loss

Keras supports two common label formats for multiclass work:

  • integer class ids such as 0, 1, 2
  • one-hot vectors such as [0, 1, 0]

The final layer should have one unit per class and usually use softmax:

python
1import tensorflow as tf
2from tensorflow import keras
3
4num_features = 20
5num_classes = 4
6
7model = keras.Sequential([
8    keras.layers.Input(shape=(num_features,)),
9    keras.layers.Dense(64, activation="relu"),
10    keras.layers.Dense(32, activation="relu"),
11    keras.layers.Dense(num_classes, activation="softmax"),
12])

If your labels are integers, use sparse categorical loss. If they are one-hot vectors, use categorical loss. That pairing is the foundation of the whole workflow.

Example with Integer Labels

This is the cleanest starting point for many tabular problems because you do not have to one-hot encode the targets yourself.

python
1import numpy as np
2from sklearn.model_selection import train_test_split
3
4X = np.random.rand(1000, 20).astype("float32")
5y = np.random.randint(0, 4, size=(1000,))
6
7X_train, X_test, y_train, y_test = train_test_split(
8    X, y, test_size=0.2, random_state=42
9)
10
11model.compile(
12    optimizer="adam",
13    loss=keras.losses.SparseCategoricalCrossentropy(),
14    metrics=["accuracy"],
15)
16
17history = model.fit(
18    X_train,
19    y_train,
20    validation_split=0.2,
21    epochs=15,
22    batch_size=32,
23    verbose=1,
24)

Because y_train contains integer class ids, SparseCategoricalCrossentropy is the correct choice. The softmax layer outputs a probability distribution across the four classes.

Predict and Convert Probabilities to Class IDs

For multiclass prediction, the model returns one probability per class. The highest probability becomes the predicted class.

python
1probs = model.predict(X_test[:5], verbose=0)
2predicted_classes = probs.argmax(axis=1)
3
4print(probs)
5print(predicted_classes)

This argmax step is standard for single-label multiclass classification. If your task allows multiple labels per sample, that is a different problem and usually requires sigmoid outputs instead of softmax.

Evaluate Beyond Accuracy

Accuracy is useful, but it can hide important failures when classes are imbalanced. A model that does well on the majority class can still be poor overall.

python
1from sklearn.metrics import classification_report, confusion_matrix
2
3y_pred = model.predict(X_test, verbose=0).argmax(axis=1)
4
5print(confusion_matrix(y_test, y_pred))
6print(classification_report(y_test, y_pred))

The confusion matrix shows which classes are being mixed up. That is often more actionable than a single accuracy number.

If classes are imbalanced, you can pass class_weight during training:

python
1class_weight = {0: 1.0, 1: 2.0, 2: 1.5, 3: 3.0}
2
3model.fit(
4    X_train,
5    y_train,
6    epochs=15,
7    batch_size=32,
8    class_weight=class_weight,
9)

That tells Keras to penalize mistakes on underrepresented classes more heavily.

The Same Pattern Applies to Other Architectures

The examples above use dense layers because the data is tabular. For images, you would replace the hidden layers with convolutional layers. For sequences, you might use an LSTM or Transformer block. The multiclass part still stays the same: one output per class, softmax, and a loss function that matches the label encoding.

That is why multiclass classification in Keras is mostly a configuration problem rather than a special model type.

Common Pitfalls

One common mistake is using categorical_crossentropy with integer labels. That usually trains poorly because the loss expects one-hot targets.

Another mistake is setting the last dense layer to the wrong number of classes. If there are five classes, the output layer needs five units.

Developers also sometimes use sigmoid for a normal multiclass problem. sigmoid is appropriate for independent multi-label outputs, not for one label chosen from a fixed set.

Finally, avoid judging the model only by accuracy. Per-class precision, recall, and confusion matrices are often where the real story appears.

Summary

  • Multiclass Keras models need the label format, output layer, and loss function to agree.
  • Use one softmax unit per class for standard single-label multiclass classification.
  • Use sparse categorical loss for integer labels and categorical loss for one-hot labels.
  • Convert predicted probabilities to class ids with argmax.
  • Evaluate per-class behavior, especially when the dataset is imbalanced.

Course illustration
Course illustration

All Rights Reserved.