How to know scikit-learn confusion matrix's label order and change it
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
A confusion matrix is only useful if you know which row and column corresponds to which class. Misreading the label order leads to swapped precision and recall values, which can silently ruin your model evaluation. Scikit-learn's confusion_matrix function has a default ordering that may not match your expectations, so understanding how to inspect and control it is essential.
Default Label Order
By default, sklearn.metrics.confusion_matrix sorts the unique labels found in y_true and y_pred in ascending order. For numeric labels this means numerical sorting; for string labels it means alphabetical sorting.
Here the rows and columns are ordered [0, 1] because sorted([0, 1]) gives [0, 1]. Row 0 corresponds to true label 0, and row 1 corresponds to true label 1.
For string labels the same principle applies:
Reading the Matrix: Rows vs Columns
The convention in scikit-learn is that rows represent true labels and columns represent predicted labels. So cm[i][j] is the count of samples whose true label is class i and whose predicted label is class j.
This means if you swap the label order, the positions of True Positives and True Negatives also swap, which changes how you read every derived metric.
Changing the Label Order with the labels Parameter
The labels parameter lets you explicitly specify which classes to include and in what order. This is the key to controlling the layout of your confusion matrix.
For binary classification, a common convention is to place the positive class last so that the bottom-right cell is True Positives:
You can also use labels to exclude certain classes by simply not including them in the list.
Visualizing with ConfusionMatrixDisplay
Scikit-learn provides ConfusionMatrixDisplay to render the matrix with proper axis labels, removing any ambiguity about which row or column belongs to which class.
Alternatively, you can generate the display directly from predictions:
Both approaches annotate each cell with its count and label the axes, so you never have to guess which class is which.
Multi-Class Example with Specific Ordering
When working with multi-class problems, controlling order becomes even more important for readability. You might want to group related classes together:
The labels parameter gives you full control over the arrangement, which is particularly useful when presenting results to stakeholders who expect a specific class ordering.
Common Pitfalls
- Assuming label order matches your training data order: The default is always sorted, not insertion-ordered. Always verify with
sorted(set(y_true) | set(y_pred)). - Mixing up rows and columns: Rows are true labels, columns are predicted labels. Reading it backwards swaps false positives with false negatives.
- Forgetting to pass
labelswhen classes are missing from predictions: If a class appears iny_truebut never iny_pred(or vice versa), the matrix dimensions can change unexpectedly. Specifyinglabelsguarantees a consistent shape. - Using
display_labelsthat do not match thelabelsorder: If you pass one order toconfusion_matrixand a different order toConfusionMatrixDisplay, the visualization will be wrong without any error. - Not normalizing for imbalanced datasets: A raw count matrix can be misleading when class sizes differ dramatically. Use
normalize='true'inConfusionMatrixDisplay.from_predictionsto see per-class recall rates.
Summary
- Scikit-learn sorts labels in ascending order by default. Check with
sorted(set(y_true) | set(y_pred))to confirm the order. - Use the
labelsparameter inconfusion_matrix()to explicitly set which classes appear and in what order. - Rows represent true labels and columns represent predicted labels, so
cm[i][j]counts samples that are truly classibut predicted as classj. - Use
ConfusionMatrixDisplayorConfusionMatrixDisplay.from_predictionsto render a labeled heatmap that removes all ambiguity. - Always pass the same
labelslist to both the matrix computation and the display to keep them consistent.

