scikit-learn
machine learning
confusion matrix
performance metrics
Python

Scikit-learn How to obtain True Positive, True Negative, False Positive and False Negative

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Scikit-learn is a powerful open-source library in Python that provides simple and efficient tools for data analysis and machine learning. It is built upon NumPy, SciPy, and Matplotlib and is widely used for its robust capabilities in handling predictive data analysis. A crucial part of assessing the performance of classification models is understanding the concepts of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). This article delves into how to extract and interpret these metrics using Scikit-learn.

Confusion Matrix

The Confusion Matrix is a fundamental tool in measuring the performance of a classification algorithm. It provides an explicit depiction of the actual vs. the predicted outcomes of the model, aiding in the derivation of the following metrics:

  • True Positives (TP): Cases where the model correctly predicts the positive class.
  • True Negatives (TN): Cases where the model correctly predicts the negative class.
  • False Positives (FP): Cases where the model incorrectly predicts the positive class.
  • False Negatives (FN): Cases where the model incorrectly predicts the negative class.

Confusion Matrix in Scikit-learn

Scikit-learn provides a convenient function confusion_matrix to compute the confusion matrix for accuracy assessment. Here's a step-by-step guide to obtaining TP, TN, FP, and FN using Scikit-learn.

python
1from sklearn.metrics import confusion_matrix
2
3# Assume y_true and y_pred are the true and predicted matching target values
4y_true = [0, 1, 0, 1, 0, 1, 0, 1]
5y_pred = [0, 1, 1, 1, 0, 0, 0, 1]
6
7# Compute the confusion matrix
8cm = confusion_matrix(y_true, y_pred)
9
10# Extract True Positive, False Positive, True Negative, False Negative
11TP = cm[1, 1]  # True Positive
12FN = cm[1, 0]  # False Negative
13FP = cm[0, 1]  # False Positive
14TN = cm[0, 0]  # True Negative

The confusion matrix is typically structured as follows:

Predicted NegativePredicted Positive
Actual NegativeTNFP
Actual PositiveFNTP

Metrics Explained

Once you have the values of TP, TN, FP, and FN, you can compute several performance metrics to evaluate the effectiveness of your model:

Accuracy

Accuracy measures the ratio of correct predictions to the total predictions.

Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Precision

Precision indicates the ratio of correct positive predictions to the total predicted positives.

Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}

Recall (Sensitivity)

Recall, also known as Sensitivity, reflects the ratio of correctly predicted positives to all actual positives.

Recall (Sensitivity)=TPTP+FN\text{Recall (Sensitivity)} = \frac{TP}{TP + FN}

Specificity

Specificity measures the proportion of true negatives correctly identified.

Specificity=TNTN+FP\text{Specificity} = \frac{TN}{TN + FP}

F1 Score

The F1 Score is the harmonic mean of precision and recall, offering a balance between the two.

F1 Score=2×Precision×RecallPrecision+Recall\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

Visualization

Visualizing the confusion matrix using a heat map can offer intuitive insights. You can employ Matplotlib and Seaborn for visualization:

python
1import matplotlib.pyplot as plt
2import seaborn as sns
3
4sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
5plt.xlabel('Predicted')
6plt.ylabel('Actual')
7plt.title('Confusion Matrix')
8plt.show()

This heat map provides an immediate graphical interpretation of the correctly and incorrectly classified instances.

Summary

In summary, understanding and utilizing True Positive, True Negative, False Positive, and False Negative values is fundamental in evaluating the performance of classification models. Scikit-learn offers a comprehensive framework to compute and analyze these metrics, facilitating effective model assessment. Here's a brief summary:

MetricFormulaInterpretation
Accuracy(TP+TN)/(TP+TN+FP+FN)(TP + TN) / (TP + TN + FP + FN)Overall correctness of the model
PrecisionTP/(TP+FP)TP / (TP + FP)Correctness of positive predictions
RecallTP/(TP+FN)TP / (TP + FN)Ability to identify actual positives
SpecificityTN/(TN+FP)TN / (TN + FP)Ability to identify actual negatives
F1 Score2×(Precision×Recall)/(Precision+Recall)2 \times (Precision \times Recall) / (Precision + Recall)Balance between precision and recall

Understanding these metrics and their computations via Scikit-learn is essential in developing robust machine learning models capable of providing accurate predictions. Whether you're working on a project or analyzing a dataset, having a firm grasp of these core concepts is vital to the efficacy of your machine learning endeavors.


Course illustration
Course illustration

All Rights Reserved.