Happy new year! Or maybe not. Doh.
Over a year ago, Lee Maguire noticed that a contributed SpamAssassin rule, FH_DATE_PAST_20XX, was naively written — simply to match any date in the year 2010 or later — and would start to false-positive on all mail in 14 months. We made the trivial fix to avoid this (for at least 10 years, by which point the rule would have obsoleted itself through normal means), and I committed it to SVN.
Problem solved, right? Nope. I’d committed to trunk, but in a moment of inattention had forgotten to backport the fix to the stable release branch, 3.2.x, as well. Nobody else noticed the mistake, and several months later, boom:
Annoyingly, the GA had assigned this rule 3.5 points in the 3.2.0 rescoring run. This meant that the effective default threshold had been lowered from 5.0 points to 1.5, which produced a 2% false positive rate during the first 13 hours of the new year.
After that point, the fix was pushed to the sa-update channel, and anyone who runs sa-update regularly (as they should!) was brought back to normal filtering behaviour.
The rule is superfluous anyway, since it overlaps with a better-written “eval” rule, DATE_IN_FUTURE_96_XX. Accordingly, most likely scenario is that it’ll be removed.
Personally, I see a few lessons from this:
Obviously, I need to pay more attention. This is easier said than done though, since SpamAssassin has nothing to do with my day job anymore; it’s a spare-time thing nowadays, and that’s a rare resource, unfortunately. :( But still, a chastening result, and I’m very sorry for my part in this screwup.
We need more active committers on Apache SpamAssassin. If we’d had more eyes, the fact that I’d forgotten to backport the fix might have been spotted. we’re definitely in a better situation now in this regard than we were 6 months ago, so that’s good.
IMO, this is a good demonstration of how too many simple rules are risky; without careful vetting and moderation, it’s easy for a bad one to slip past. Perhaps we need to move more towards a DNSBL/network-rule driven approach, although this has its downsides too. Still thinking about this.
It’d be good to fix the GA so that it wouldn’t assign such high points to simple rules like this, without some indication that a human has vetted them and believes them trustworthy.
Daryl posted a good comment on /.:
Clearly we dropped the ball on this one. As far as I know it’s our first big rule screw up in the project’s 10 years. If you’re going to screw up you might as well do it well.
+1 to that!
And to everyone who had to clean up the fallout and spend a holiday recovering lost mails from spam folders… sorry :(