How to output per-class accuracy in Keras?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
Overall accuracy can hide important failure modes, especially on imbalanced datasets. Per-class accuracy shows how well the model performs for each label individually, which makes it much easier to see whether one class is consistently being ignored or confused.
Start with Predictions and a Confusion Matrix
Keras does not usually print per-class accuracy automatically during standard training logs, so the common approach is to run predictions on a validation or test set and compute class-wise metrics afterward.
The confusion matrix gives you the raw counts needed to compute per-class accuracy.
Compute Per-Class Accuracy Explicitly
For class i, per-class accuracy is the number of correctly predicted samples of that class divided by the number of true samples of that class.
If class 0 has 40 correct predictions out of 45 true examples, its per-class accuracy is 40 / 45.
This number is often more informative than overall accuracy because it isolates class-specific behavior.
Print a Readable Report
A simple loop makes the output easier to interpret.
This is usually enough for offline evaluation scripts and experiment notebooks.
Use classification_report for a Broader View
Per-class accuracy is useful, but it is often best interpreted alongside precision, recall, and F1 score.
In multi-class classification, recall for each class is closely related to what many people informally mean by per-class accuracy, because it measures how many true members of that class were recovered correctly.
Add It at the End of Each Epoch with a Callback
If you want per-class results while training, use a custom callback that runs on validation data at the end of each epoch.
This is helpful during experimentation, though it can slow training if the validation set is large.
Be Careful with Label Encoding
The metric logic depends on how your labels are represented.
- if labels are one-hot encoded, use
argmax - if labels are integer-encoded already, use them directly
For example:
Mixing one-hot and integer assumptions is one of the easiest ways to compute the wrong metric silently.
Common Pitfalls
A common mistake is calling precision or recall "per-class accuracy" without being clear about the definition. Be explicit about what you are reporting.
Another is using overall accuracy on an imbalanced dataset and assuming the model is healthy when one minority class has almost zero recovery.
Developers also sometimes compute the confusion matrix from shuffled labels and predictions that no longer align, which makes every downstream metric meaningless.
Summary
- Keras usually does not print per-class accuracy automatically in standard logs.
- The common solution is to compute it from predictions and a confusion matrix.
- Per-class accuracy is the diagonal count divided by the number of true samples for each class.
- '
classification_reportis a useful companion because it adds precision, recall, and F1 score.' - A custom callback can print per-class results after each epoch when needed.

