What is a Confusion Matrix in Machine Learning

Confusion matrix

Within machine learning (ML), a confusion matrix is a table that evaluates the performance of a classification algorithm. It is particularly useful when you want to understand how well the algorithm is classifying instances into different categories.

Below is a basic representation of a confusion matrix.

confusion matrix in machine learning diagram

The rows represent the predicted classes, while the columns represent the actual classes. Each cell in the matrix represents the number of instances that were classified into a particular combination of predicted and actual classes. Let’s explore the breakdown of the fundamentals:

True Positives (TP): Prediction is true and actual value is true. As a result, the technique accurately predicts the positive value.
True Negatives (TN): Prediction is false and actual value is false. As a result, the technique accurately predicts the negative value.
False Positives (FP): Prediction is true but actual value is false. As a result, the technique incorrectly predicts the positive value.

False Negatives (FN): Prediction is false but actual value is true.

With these values, you can calculate various performance metrics such as accuracy, precision, recall, and F1-score. These calculations provide insight into how well the model is performing for each class and overall.

Performance Metrics

Let’s briefly define the above 4 performance metrics.

Accuracy

Accuracy measures the proportion of correctly classified instances among all instances. The calculation: the ratio of the number of correct predictions (true positives and true negatives) to the total number of predictions.

Precision

Precision measures the proportion of correctly classified positive instances among all instances that were predicted as positive. The calculation: the ratio of true positives to the sum of true positives and false positives.

Recall (Sensitivity or True Positive Rate)

Recall measures the proportion of correctly classified positive instances among all actual positive instances. The calculation: the ratio of true positives to the sum of true positives and false negatives.

F1-score

F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall, making it a useful metric when you want to consider both false positives and false negatives.