On the effects of lowering your SpamAssassin threshold

So I was chatting to Danny O’Brien a few days ago. He noted that he’d reduced his Spamassassin “this is spam” threshold from the default 5.0 points to 3.7, and was wondering what that meant:

I know what it means in raw technical terms — spamassassin now marks anything >3.7 as spam, as opposed to the default of five. But given the genetic algorithm way that SA calculates the rule scoring, what does lowering the score mean? That I’m more confident that stuff marked ham is stuffed marked ham than the average person? That my bayesian scoring is now really good?

Do people usually do this without harmful side-effects? What does it mean about them if they do it?

Does it make me a good person? Will I smell of ham? These are the things that keep me awake at night.

It’s a good question! Here’s what I responded with — it occurs to me that this is probably quite widely speculated about, so let’s blog it here, too.

As you tweak the threshold, it gets more or less aggressive.

By default, we target a false positive rate of less than 0.1% — that means 1 FP, a ham marked as spam incorrectly, per 1000 ham messages. Last time the scores were generated, we ran our usual accuracy estimation tests, and got a false positive rate of 0.06% (1 in 1667 hams) and a false negative rate of 1.49% (1 in 67 spams) for the default threshold of 5.0 points. That’s assuming you’re using network tests (you should be) and have Bayes training (this is generally the case after running for a few weeks with autolearning on).

If you lower the threshold, then, that trades off the false negatives (reducing them — less spam getting past) in exchange for more false positives (hams getting caught). In those tests, here’s some figures for other thresholds:

SUMMARY for threshold 3.0: False positives: 290 0.43% False negatives: 313 0.26%

SUMMARY for threshold 4.0: False positives: 104 0.15% False negatives: 1084 0.91%

SUMMARY for threshold 4.5: False positives: 68 0.10% False negatives: 1345 1.13%

so you can see FPs rise quite quickly as the threshold drops. At 4.0 points, the nearest to 3.7, 1 in 666 ham messages (0.15%) will be marked incorrectly as spam. That’s nearly 3 times as many FPs as the default setting’s value (0.06%). On the other hand, only 1 in 109 spams will be mis-filed.

Here’s the reports from the last release, with all those figures for different thresholds — should be useful for figuring out the likelihoods!

In fact, let’s get some graphs from that report. Here is a graph of false positives (in orange) vs false negatives (in blue) as the threshold changes…

and, to illustrate the details a little better, zoom in to the area between 0% and 1%…

You can see that the default threshold of 5.0 isn’t where the FP% and FN% rates meet; instead, it’s got a much lower FP% rate than FN%. This is because we consider FPs to be much more dangerous than missed spams, so we try to avoid them to a higher degree.

An alternative, more standardized way to display this info is as a Receiver Operating Characteristic curve, which is basically a plot of the true positive rate vs false positives, on a scale from 0 to 1.

Here’s the SpamAssassin ROC curve:

More usefully, here’s the ROC curve zoomed in nearer the “perfect accuracy” top-left corner:

Unfortunately, this type of graph isn’t much use for picking a SpamAssassin threshold. GNUplot doesn’t allow individual points to be marked with the value from a certain column, otherwise this would be much more useful, since we’d be able to tell which threshold value corresponds to each point. C’est la vie!

Update:: this is possible with GNUplot 4.2 onwards, it seems. great news! Hat tip to Philipp K Janert for the advice. here are updated graphs using this feature:

(GNUplot commands to render these graphs are here.)

Update again: much better interactive Flash graphs here.

This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

13 Comments

  1. Anonymous
    Posted February 29, 2008 at 16:38 | Permalink

    FYI, all your image are belong to “Please don’t hotlink to taint.org”. But I am reading this on a pure taint.org URI.

  2. Anonymous
    Posted February 29, 2008 at 16:40 | Permalink

    Scratch that, I was a Ctrl-refresh away from getting them right.

    Nice read, thanks for the post.

  3. Terry Zink
    Posted February 29, 2008 at 18:05 | Permalink

    This is a good post. I’ve long known that false positives vs false negatives is not a linear trade-off. One of my arguments has been that getting more aggressive on spam marginally increases spam effectiveness but greatly increases false positives. Now I have data to confirm this.

  4. Posted February 29, 2008 at 18:14 | Permalink

    hi Terry! happy to oblige ;)

  5. Emmanuel Lécharny
    Posted March 1, 2008 at 09:04 | Permalink

    Hi,

    interesting post ! It make me think that it would be a good thing to pass two times through spamassassin : pass1 : use a low level threshold (3.0 or lower) pass2 : use a higher threshoold (5 or 6) then marking the potential spams found in pas1 not in pass2 as potential FP. It would ease catching real mails marked as potential FP, instead simply consider that everything marked as spam is spam.

  6. Posted March 3, 2008 at 10:41 | Permalink

    hi Emmanuel —

    yes, this is a very good idea. In fact you can do this without passing it through SA twice, by examining the number of points the message scored, using the “X-Spam-Level” header. For example I use this in my mail filtering script.

    [ 'X-Spam-Level',         '..........',     'Spam2'],
    [ 'X-Spam-Level',         '.....',          'Spam'],
    

    The ‘Spam2’ folder is where mails with 10 or more points are sent, while the “low-scoring” spam, with between 5 and 9 points, are sent to ‘Spam’. Since 88% of spam scores over 10 points, with an unmeasurably small number of FPs in those tests, this works very well…

  7. Posted March 20, 2008 at 03:28 | Permalink

    This is slightly off-topic, but…

    I don’t think it is correct that “gnuplot doesn’t allow individual points to be marked with the value from a certain column” as you said – gnuplot version 4.2+ includes a plot style “with labels”, which does exactly that. ;-)

    If you want to know more about gnuplot’s power features, you might want to check out my book on the topic: “Gnuplot in Action”. You can pre-order it directly from the publisher: Manning: Gnuplot in Action.

    If you want to learn more about the book and the author, check out my book page at Principal Value – Gnuplot in Action.

    Let me know if you are interested in a review copy.

  8. Posted March 20, 2008 at 11:25 | Permalink

    Philipp — thanks for the great tip! I’ll try that out.

    The book looks fantastic, btw. However, I really don’t use gnuplot enough to be able to provide a competent review, so I’d feel quite guilty about the idea of a review copy. But thanks for the kind offer anyway ;)

  9. Posted March 23, 2008 at 17:05 | Permalink

    I use hostgator as my hosting operation. It has very robust tools to view just about anything for any of my domains.

    I specified a ‘Spam’ folder for Spamassassin and cannot find this folder anywhere. I want to see what is being filtered out.

    I looked in the top level (above public_html) and everything under it.

    Hostgator uses Linux.

    Where can I find the Spam folder?

  10. Posted March 25, 2008 at 00:08 | Permalink

    hi Bill —

    I have bad news; that is entirely up to Hostgator, so I can’t provide you with any info that could help.

  11. Odi
    Posted March 25, 2008 at 23:08 | Permalink

    Why don’t you use a log scale for the ROC curve? It should make things much more readable.

  12. Posted March 26, 2008 at 21:11 | Permalink

    Odi: yes, I must try that.

  13. Posted September 16, 2009 at 01:20 | Permalink

    Hi Justin:

    RE: “using the “X-Spam-Level” header. For example I use this in my mail filtering script.

    [ ‘X-Spam-Level’, ‘……….’, ‘Spam2’], [ ‘X-Spam-Level’, ‘…..’, ‘Spam’], The ‘Spam2? folder is where mails with 10 or more points are sent, while the “low-scoring” spam, with between 5 and 9 points, are sent to ‘Spam’. “

    Is there a way to set that up in Outlook Express? I’m not seeing a way to create a rule that will do that, but would LOVE TO.

    Thanks for any help you can offer.

    Best, kwc