Precision vs. recall - explanation
Why couldn’t I remember the difference between precision and recall?
First of all, I had a problem with confusion matrix. The fact that the order of cells is different in textbooks and in the output produced by the tools we use was particularly problematic.
What is a column? Is it the actual value or the predicted value? Is true positive in the upper left cell or the lower right? I knew that both precision and recall are just rations between the confusion matrix cells, but which ones? It was kind of confusing ;)
Fortunately, it is easy to understand precision and recall. Note I wrote: “understand.” I care about the intuition and understanding, not the calculation. You can always google the equation or just use your favourite tool to calculate that.
Imagine that a radar is a classifier. A classifier that classifies a point in space and returns one of two verdicts: “aircraft” or “empty.”
Let’s assume that we work for the army and we are supposed to build a radar (a classifier) to detect those airplanes.
If we build a classifier which finds all airplanes and does not classify an empty point as an aircraft we have the following output.
It is a perfect classifier. Precision = 1, recall = 1 We have found all airplane and we have no false positives.
On the other hand, if we have an output which looks like this:
We have perfect precision once again. All points reported as an airplane are in fact airplanes. The only problem is a terrible recall. We have not found all airplanes. If those are enemy’s aircraft, we have a huge problem.
There is one more mistake we can make. Is a classifier good enough if it finds all airplanes, but also reports a lot of empty spots as airplanes?
We detected all enemy aircraft. The only problem is, we waste a lot of jet fuel because we scramble our fighter aircraft way too often and they chase a non-existing enemy.
What is more important? Precision or recall?
There are two correct answers to that question:
In the case of the radar, I assume we need to detect all airplanes, and we can accept some false positives. It is better to chase after a non-existing enemy than allow one of their aircraft to be not intercepted.
To prove that the tradeoff between precision and recall is, in fact, a business decision, let’s look at an example of a product that needs both precision and recall to be average.
What if we were developing a dating website? What if after registration your clients filled out a survey that you use to train a classifier. The classifier predicts whether another person is a good or bad match for that user. (Yes, I know. It is not the way such websites work. It doesn’t matter. I need an example ;) )
We don’t want a perfect precision. If we had such classifier, the user would see only a few results (the ideal matches), found Mr. or Ms. Right immediately and never went back to your page. We don’t want that.
We can’t have too good recall either because it is always a tradeoff. Having the perfect recall would most likely require allowing many false positives (terrible precision). If you display many irrelevant results, the user will be disappointed and will run away to your competition.
What we want is precision/recall that gives the user some hope, so they return to your page every day and use it for as long as possible. Now you also know why dating websites are scam ;)