Recall Formula: An Important Evaluation Metric in Machine Learning to Know Now

Recall Formula: An Important Evaluation Metric in Machine Learning to Know Now

3 mins read79 Views Comment
Updated on Sep 13, 2023 10:10 IST

Recall is an important metric because it measures the ability of a model to identify all of the positive instances in a dataset. It is calculated as the number of true positives divided by the total number of positives, including both true positives and false negatives.

2023_09_Recall-Formula.jpg

Precision and Recall are two important evaluation matrices in Machine Learning that evaluate the performance of the classification models, in particular binary classification. These matrices help make correct predictions and identify all actual positive instances.

But this article will discuss only one metric, i.e., Recall.
So, let’s get started.

Definition of Recall

It is an evaluation metric for the classification problem that quantifies the ability of a model to identify all the relevant instances from the dataset correctly. It is also known as Sensitivity or True Positive Rate (TPR).

In simple terms, it answers the question: “Out of all the actual positives, how many were correctly predicted by the model?“.

Mathematically, it is defined as the Number of True Positives divided By the Number of True Positives and False Negative.

How to improve machine learning model
Model Selection in Machine Learning: Regression
Difference Between Precision and Recall

Recall Formula

Recall = True Positive (TP) / True Positive (TP) + False Negative (FN)

where,

True Positive (TP) = Represents the number of positive instances correctly identified by the model.

  • i.e., Cases where both the Actual and Predicted Classes are Positive.

False Negative (FN) = Represents the number of positive instances that the model incorrectly identifies as Negative.

  • i.e., cases, where the actual class is positive, but the model predicted them negatively, i.e., wrong predictions.

Note:

  • High Recall Value: It means that the model is good at identifying the correct data (positives).
  • Low Recall Value: It means that the model is not good at identifying the correct data points.
  • It is always recommended to have a high recall value.

Let’s take an example to understand why High Recall value is important.

Assume you are a doctor and developing a new diagnostic test for a rare disease. The disease is very serious, but it is very rare. This means there are many more who do not have the disease than those who do have the issue.

2023_09_example-of-recall-formula.jpg

Not, let’s calculate the Recall Value using the Recall Formula.

Recall = 560/560 + 50 = 560/610 = 91.80

=> Recall = 91.80

Here, the recall value is very high (approx 92%). This means that the test can identify approximately all of the people who actually have the disease, even it can also identify some people who don’t.

But what if we get a low recall value?

Here, in this case, if we get a lower value of recall, it will indicate that it is possible that some people who actually have the disease will not get identified. This may lead to serious consequences, as the disease will spread more and will be difficult to treat.

Confusion Matrix in Machine Learning
AUC-ROC in Machine Learning
Gradient Descent in Machine Learning

When to Use Recall Formula?

  • When it is important to identify all of the positive instances, then it is to avoid False Positives.
    • In a medical setting, a diagnostic test with a high recall value would be more likely to identify all the patients who have a disease.
  • When the cost of False Negative is High.
    • Fraud Detection System: A false negative could result in a fraudulent transaction being approved, which could be a financial loss.
  • If the data is imbalanced (i.e., Negative Instances > Positive Instances)
    • As it is important to identify all the positive instances.

When NOT to Use Recall Formula?

  • When it is more important to avoid false positives than it is to identify all of the positive instances. 
    • Spam filter: A false positive could result in a legitimate email being sent to the spam folder, which could inconvenience the user.
  • When the data is balanced, (there are an equal number of positive and negative instances. )
    • In this case, precision is a more important metric than recall.
  • When the cost of a false positive is low. 
    • Content Moderation System: A false positive could result in a legitimate post being flagged as spam, which could annoy the user, but it is not as serious as a fraudulent transaction being approved.
Difference between Null Hypothesis and Alternative Hypothesis
Difference Between Type 1 and Type 2 Error
ROC-AUC vs Accuracy: Which Metric Is More Important?
How to Calculate the F1 Score in Machine Learning
Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

1.53 L
11 months
2.5 L
2 years
34.65 K
11 months
2.5 L
2 years
5.6 L
18 months
– / –
6 months
– / –
8 hours

Conclusion

Recall is one of the important evaluation metrics that answers the questions, Out of all the actual positives, how many were correctly predicted by the model? This blog is centered around the formula of Recall (Recall Formula) and how to calculate it.

Hope this blog helps you to understand the Recall formula and gives a better understanding of when to use it and when to not.

Keep Learning!!

Keep Sharing!!

About the Author