False Positive ‘Reports’ != FP Measurement

John Graham-Cumming writes an excellent monthly newsletter on anti-spam, concentrating on technical aspects of detecting and filtering spam. Me, I have a habit of sending follow-up emails in response ;)

This month, it was this comment, from a techie at another software company making anti-spam products:

When I look at the stats produced on our spam traps, which get millions of messages per day from 11 countries all over the world, I see our spam catch rate being consistently over 98% and over 99% most of the time. We also don’t get more than 1 or 2 false positive reports from our customers per week, which can give an impression of our FP rate, considering the number of mailboxes we protect.

My response:

‘Worth noting that a “false positive report from our customer” is NOT the same thing as a “false positive” (although in fairness, [the sender] does note only that it will “give an impression” of their FP rate).

This is something that I’ve seen increasingly in the commercial anti-spam world — attempting to measure false positive rates from what gets reported “upstream” via the support channels.

In reality, the false positives are still happening — it’s just that there are obstacles between the end-user noticing them, and the FP report arriving on a developer’s desk; changes to the organisational structure, surly tech support staff, or even whether the user was too busy to send that report, will affect whether the FP is counted.

Many FPs will go uncounted as a result. As a result, IMO it is not a valid approach to measurement.’

I’ve been saying th