The Life of a SpamAssassin Rule

Spam: during a recent discussion on the SpamAssassin dev list, the question came up as to how long a rule could expect to maintain its effectiveness once it was public — the rule secrecy issue.

In order to make a point — that certain types of very successful rules can indeed last a long time — I picked out one rule, MIME_BOUND_DD_DIGITS. Here’s a smartened-up copy of what I found out.

This rule matches a certain format of MIME boundary, one observed in 17.4637% of our spam collection and with 0 nonspam hits. Since we have a massive collection of mails, received between Jan 2004 to May 2005, and a rule with a known history, we can then graph its effectiveness over time.

The rule’s history was:

  • bug 3396: the initial contribution from Bob Menschel, May 15 2004
  • r10692: arrived in SVN: May 16 2004
  • r20178: promoted to ‘MIME_BOUND_DD_DIGITS’: May 20 2004 (funnily enough, with a note speculating about its lifetime from felicity!)
  • released in the SpamAssassin 3.0.0 release: mid-Sep 2004

So, we would expect to see a drop in its effectiveness against spam in late May 2004 and onwards, if the spammers were reacting to SVN changes; or post September 2004, if they react to what’s released.

By graphing the number of hits on mails within each 2-hour window, we can get a good idea of its effectiveness over time:

The red bars are total spam mails in each time period; green bars, the number of spam mails that hit the rule in each period. May 15 2004 and Sep 20 2004 are marked; Jan 2004 is at the left, and May 2005 is at the right-most extreme of the graph. (There’s a massive spike in spam volume at the right — I think this is Sober.Q output, which disappears after a week or so.)

It appears that the rule remains about even in effectiveness in the 4 months it’s in SVN, but unreleased; it declines a little more after it makes it into a SpamAssassin release. However, it trails off very slowly — even in May 2005, it’s still hitting a good portion of spam.

Given this, I suspect that most spammers are not changing structural aspects of their spam in response to SpamAssassin with any particular alacrity, or at least are not capable of doing so.

To speculate on the latter, I think many spammers are using pirated copies of the spamware apps, so cannot get their hands on updated versions through ‘legitimate’ channels.

Speculating on the former — in my opinion there’s a very good chance that SpamAssassin just isn’t a particular big target for them to evade, compared to the juicy pool of gullible targets behind AOL’s filters, for example. ;)

Tags: , , , , , , , , ,

Comments (3)

eWeek’s ‘Spammers Upending DNS’ article

Spam: eWeek recently published an article entitled ‘Spammers’ New Tactic Upends DNS’ , which notes that:

One .. technique finding favor with spammers involves sending mass mailings in the middle of the night from a domain that has not yet been registered. After the mailings go out, the spammer registers the domain early the next morning.

By doing this, spammers hope to avoid stiff CAN-SPAM fines through minimal exposure and visibility with a given domain. The ruse, they hope, makes them more difficult to find and prosecute.

The scheme, however, has unintended consequences of its own. During the interval between mailing and registration, the SMTP servers on the recipients’ networks attempt Domain Name System look-ups on the nonexistent domain, causing delays and timeouts on the DNS servers and backups in SMTP message queues.

This had me stumped when I read it, since an email from a nonexistent domain is a pretty reliable spamsign (it’s used in the NO_DNS_FOR_FROM rule in SpamAssassin, for example, which hits about 2% of spam), has been a rule in the default ruleset for several years, and there’s no sign of that behaviour in our spam traps.

After some discussion, Suresh Ramasubramanian came up with this explanation of what’s really happening:

Verisign now allows immediate (well, within about 10 minutes) updates of .com/.net zones (also same for .biz) while whois data is still updated once or twice a day. That means if spammer registers (a) new domain he’ll be able to use it immediatly (sic) and it’ll not yet show up in whois (and so not be immediatly identifiable to spam reporting tools) - and spammers are in fact using this “feature” more and more!

That does sound a much more likely explanation, and matches what’s been seen in the traps.

So: WHOIS, not DNS.

Tags: , , , , , , , , ,

Comments

Annoying Non-spam Tricks, pt. XVIII

Spam: OK, I just noticed that I have a few hits for the SpamAssassin rule HTTP_ENTITIES_HOST in my corpus. This searches for obfuscated hostnames in the URL links in mail messages, and is generally a very reliable sign of spam — because who would want to hide a hostname apart from spammers?

Well, Buy4Now.IE, for one, it seems. WTF? I have a mail here that uses this markup:

  <a href="''http://www&#46;buy4now&#46;ie/fbd''>

Totally and utterly nuts. If they really wanted a way to tickle malware detectors, mail filters, and anti-spam measures, they could hardly pick a better one. I have no idea why they did this.

grr….

Tags: , , , , , , , , ,

Comments

classic Bayes poison

Spam: via NTK — a slightly over-literal interpretation of the SpamAssassin QUOTED_EMAIL_TEXT rule. Classic. (warning: NSFW spam content)

Tags: , , , , , , , , , ,

Comments

Soldiers in Iraq, and Vipul

The Killer Elite (Rolling Stone):

The twenty-two-year-old driver, Cpl. Joshua Ray Person, and the vehicle team leader, twenty-eight-year-old Sgt. Brad Colbert — both Afghan War veterans – have already reached a profound conclusion about this campaign: that the battlefield that is Iraq is filled with ‘fucking retards.’

Later on:

Captain America, the platoon commander who is almost universally disrespected by the enlisted men, seems to deal with the stress by rising to a state of jabbering incoherence. Up by the bridge there are four enemy dead scattered under the eucalyptus trees, along with piles of munitions — RPGs, AKs and hand grenades. Captain America runs back and forth, picking up their weapons, hurling them into the nearby canal and screaming at the top of his lungs. No one knows what he’s screaming about or why, but as another officer who came upon this scene later concluded, ‘Whatever he was doing, he was not being in command.’

Fantastic series of articles, well worth a read. (Found on stuff.) Similar to this, here’s an unauthorized weblog from a soldier on duty in Iraq — the inside story.

Spam: Good article by Vipul on spam filtering, at MIT Tech Review:

Here’s a list of three rules (created after the most important features of e-mail) that anti-spam software should strive to follow:
  • 1) Ability to send and receive e-mail from a stranger. (Whitelisting, payment systems, and challenge/response break this rule.)

  • 2) Ability to send and receive pseudo-anonymous e-mail. (Domain-based authentication breaks this rule.)

  • 3) E-mail should be free. (Payment systems break this rule.)

He said it. Killing off several useful legit uses of email, just to fix spam, is no good. Looks like he’s started writing his blog-like thing too, again, so I’ll be adding that to my ‘roll (assuming it stays updated! ;) No RSS yet though…

Tags: , , , , , , , , ,

Comments

‘Shooting The Messenger’

Yoz does a great job rounding up some Plan For Spam links. First off, he links to a great essay, Shooting The Messenger, which nicely rebuts the idea that to deal with spam, we need an SMTPng. Recommended. (He goes a bit overboard with some hard-ass filtering recommendations at the end IMO, though…)

Secondly, Yoz links to a couple more posts. The first is a friendly-fire incident involving the SpamCop DNS blacklists, illustrating the dangers of peer-to-peer ‘this is spam’ reporting. There’s a related issue with the SpamCop DNSBL, in that it’s over-sensitive; one report can sometimes be enough to get a site BLed, which is not good. The problems with SpamCop’s hair-trigger thresholds are well-documented, and — hopefully — Julian will fix them soon.

The second is a mail from John Gilmore to Politech. He says ‘a simple rule for anti-spam measures that preserves non-spammers’ freedom to communicate is: No anti-spam measure should ever block a non-spam message. But there isn’t a single anti-spam organization that actually follows this rule.’

Wrong. That’s exactly the SpamAssassin angle. If the user says it’s not spam, it’s not spam — and we have to figure out a way to get our scoring system to return that result, if at all possible. And yes, it gets it wrong about 0.1% of the time — and that’s why we never tell users to block, bounce or delete spam if at all possible; just mark it ‘possible spam’ and divert to another folder, and always let a human take a look to verify that decision.

Given the nature of the spam problem, and the nuisance it poses to virtually everybody trying to use email, that’s the best that can be done at this point.

And yes, something has to be done. Spam is a massive problem. If it’s not dealt with somehow, and kept out of our day-to-day inboxes, people will stop using mail. Before spam filters became ubiquitous, I talked to many casual internet users who (a) closed down their email address every 6 months to escape the flood, or (b) gave up reading their mail because of it. (And why did spam filters become ubiquitous?)

It comes down to: what’s better for the internet — a mislabelled email in your ’spam bucket’ folder — or no email at all?

Tags: , , , , , , , , ,

Comments

Iraq

things are getting scary. Two stories of note:

Guardian: US plans military rule and occupation of Iraq.

The US has plans to establish an American-led military administration in Iraq, similar to the postwar occupation of Germany and Japan, which could last for several years after the fall of Saddam Hussein, it emerged yesterday.

The occupation of the country would need an estimated 75,000 troops, at an annual cost of up to $16bn, and would almost certainly include British and other allied soldiers. It would be run by a senior American officer, perhaps General Tommy Franks, who would lead the assault on Iraq, and whose role would be modelled on that of General Douglas MacArthur in postwar Japan. ….

The Iraqi project, outlined by Mr Bush’s senior adviser on the Middle East, Zalmay Khalilzad, would involve running the entire country until a democratic Iraqi government was deemed ready.

New Yorker:

The vision laid out in the Bush document is a vision of what used to be called, when we believed it to be the Soviet ambition, world domination. It’s a vision of a world in which it is American policy to prevent the emergence of any rival power, whatever it stands for — a world policed and controlled by American military might.

This goes much further than the notion of America as the policeman of the world. It’s the notion of America as both the policeman and the legislator of the world, and it’s where the Bush vision goes seriously, even chillingly, wrong. A police force had better be embedded in and guided by a structure of law and consent. There’s a name for the kind of regime in which the cops rule, answering only to themselves. It’s called a police state.

Worth quoting this snippet too:

For example, as a way of enhancing “national security,” it promises to press “other countries” to adopt “lower marginal tax rates” and “pro-growth legal and regulatory policies” — your doctor’s names for tax cuts for the rich and environmental laxity. And it exalts economic relationships as more fundamental than political and social ones (a mental habit that orthodox conservative ideologues share with their orthodox Marxist counterparts), as in this passage praising free trade as a “moral principle”: “If you can make something that others value, you should be able to sell it to them. If others make something that you value, you should be able to buy it. This is real freedom, the freedom for a person — or a nation — to make a living.” (As distinct, presumably, from the secondary, not quite real freedoms of thought, conscience, and expression.)

Tags: , , , , , , , , ,

Comments