The Life of a SpamAssassin Rule

Spam: during a recent discussion on the SpamAssassin dev list, the question came up as to how long a rule could expect to maintain its effectiveness once it was public — the rule secrecy issue.

In order to make a point — that certain types of very successful rules can indeed last a long time — I picked out one rule, MIME_BOUND_DD_DIGITS. Here’s a smartened-up copy of what I found out.

This rule matches a certain format of MIME boundary, one observed in 17.4637% of our spam collection and with 0 nonspam hits. Since we have a massive collection of mails, received between Jan 2004 to May 2005, and a rule with a known history, we can then graph its effectiveness over time.

The rule’s history was:

  • bug 3396: the initial contribution from Bob Menschel, May 15 2004
  • r10692: arrived in SVN: May 16 2004
  • r20178: promoted to ‘MIME_BOUND_DD_DIGITS’: May 20 2004 (funnily enough, with a note speculating about its lifetime from felicity!)
  • released in the SpamAssassin 3.0.0 release: mid-Sep 2004

So, we would expect to see a drop in its effectiveness against spam in late May 2004 and onwards, if the spammers were reacting to SVN changes; or post September 2004, if they react to what’s released.

By graphing the number of hits on mails within each 2-hour window, we can get a good idea of its effectiveness over time:

The red bars are total spam mails in each time period; green bars, the number of spam mails that hit the rule in each period. May 15 2004 and Sep 20 2004 are marked; Jan 2004 is at the left, and May 2005 is at the right-most extreme of the graph. (There’s a massive spike in spam volume at the right — I think this is Sober.Q output, which disappears after a week or so.)

It appears that the rule remains about even in effectiveness in the 4 months it’s in SVN, but unreleased; it declines a little more after it makes it into a SpamAssassin release. However, it trails off very slowly — even in May 2005, it’s still hitting a good portion of spam.

Given this, I suspect that most spammers are not changing structural aspects of their spam in response to SpamAssassin with any particular alacrity, or at least are not capable of doing so.

To speculate on the latter, I think many spammers are using pirated copies of the spamware apps, so cannot get their hands on updated versions through ‘legitimate’ channels.

Speculating on the former — in my opinion there’s a very good chance that SpamAssassin just isn’t a particular big target for them to evade, compared to the juicy pool of gullible targets behind AOL’s filters, for example. ;)

Tags: , , , , , , , , ,

Comments (3)

Open source v closed-source spam filtering

Spam: I’m quoted in
New Scientist! w00t!

SlashDot picked it up pretty quickly. One comment there misses the point, though:

This is interesting and promising technology. But like all antispam techniques, spammers will find a way around it. Once spammers get a copy of the software, they can create and test countermeasures in the comfort of their own sleazy lairs.

It’s worth talking about this. Newsflash: spammers have no difficulty testing their spam against closed-source spam filters, even when they can’t ‘get a copy’ and test them in ‘their sleazy lairs’.

How do they do it? Easy — just set up an account at a site that uses that filter (AOL, Yahoo!, Hotmail, and GMail, it’s pretty obvious how to do that; for other closed-source filters, find an ISP that uses it). Then send ‘test mails’ repeatedly to that account, and apply trial and error to see what gets past the filter and what doesn’t. Eventually, they figure out what works for that filter, and what doesn’t.

How did I figure this out? Well, I came across the manual for the Send-Safe ratware on-line. It noted that the ‘hashbuster’ randomisation technique, which we in the SpamAssassin team had long assumed was intended to block hash matches by DCC, Pyzor and Razor, was in fact intended to block AOL’s implementation of that system. The open source ones weren’t even mentioned.

Update: found it — from their FAQ:

Mime Encoded content

If you want to get into AOL… use it.

MIME encoders allow you to send documents written within a specific application through email without causing readability or formatting problems. For example, you can send a letter created in MSWord with and be certain that it arrives at its destination in the same format by encoding it with MIME first. The recipient then decodes it back into the original MSWord format.

That isn’t why we use it though.

We use it to cause ‘uniqueness’.

When you put a rotate tag at the beginning of a MIME encoded email, it causes everything after that point (including checksums) to be ‘different’ in every message.

Why is that that important?

Because it throws off filters that look for many copies of the same message to nuke.

Tags: , , , , , , , , ,

Comments

Jon Udell’s forged S/MIME signature, and spam

Spam: Jon Udell: How to forge an S/MIME signature, and Liudvikas Bukys’ take on the results: ‘Jon Udell tries his hand at S/MIME signature forgery, revealing that PKI is not a panacea. A digital signature proves something. The proof is strong but the something is weak (if it just demonstrates that you clicked a few things to get a persona certificate).’

He then suggests two ways to use this info in useful ways:

The first is ‘higher-class certificates (where certificate authorities demand more proof, and encode that fact in the certificate). But higher quality means harder to get and less actual deployment. And higher quality means more attractive target for theft of keys.’

In the anti-spam case, it also means that you trust the certificate provider to both (a) accept money from their customers to issue them certs, and (b) take away those certs from their own customers if they infringe by sending spam messages. This is the hard part. There’s an active financial disincentive for a company to do this; the people who benefit (the end-users) are not their paying customers, whereas the people who get hurt (the infringers) are. Economics dictates that they water down the requirements, in order to maximise their profits – making the system useless.

On the other hand we have: ‘reputation systems. Of course, building robust reputation systems is not easy. Users may wish to have multiple sources of reputation information to fit their own definitions of good and bad behavior and how fast those judgments are made. It replays the whole DNS blacklist deployment. Some reputation systems may seem arbitrary and capricious. Others may be too slow or too tolerant. They are all lawsuit targets. Will there be too many to choose from?’

‘zackly. An excellent illustration of how S/MIME or other PKI will not solve the spam problem, and we’ll still have the same DNSBL situation as we have now (although hopefully working a lot better).

S/MIME may solve the forged-email problem, like SPF does — however, like SPF it will still need to work with reputation systems to be usable as an anti-spam scheme.

Tags: , , , , , , , , , ,

Comments