Spam filter evasion self-defeating?

Donncha asks, is spam self-defeating?

has anyone else noticed that the new generation of gif based stock-trading spams are getting really hard to read? In the last one I had to squint and look really carefully to find out what stock was hot and a sure-buy today!

I’ve been wondering about this, too. We continually push spammers further and further from comprehensibility, since comprehensible spam is easily-filtered spam, but the spam flood doesn’t stop. In fact, spam volumes have shot up higher than ever.

My theory is that it’s a symptom of the spam side of things being a market in itself (and an inefficient, scam-heavy one at that).

IMO, the people providing the underlying products advertised in “high-end” spam – the pill-peddlers and stock pumpers — no longer control the technical details of how or where the spam is sent. Instead, they are the customers of professional spam gangs who do that, and take care of the obfuscation, filter-evasion, etc.

In other words, the pill-peddlers and scam operators are getting ripped off, too. They think their products or scams will be advertised in a comprehensible manner, in readable emails; but instead, odd, opaque 3-word messages with “cut and paste this” lines, hidden inside filter-evasion text and bits of Project Gutenberg, are what gets delivered to the victims.

I can’t imagine the clickthrough rates are exactly stellar on that. So I’d guess the spammers are responding by pushing up volumes to attempt to increase clickthrough/sales volumes. Wonder if it’s working or not?

Tags: , , ,

Comments (4)

Blog Spam, and a ‘nofollow’ Post-Mortem

An interesting article on blog-spam countermeasures — Google’s embarrassing mistake. Quote:

I think it’s time we all agreed that the ‘nofollow’ tag has been a complete failure.

For those of you new to the concept, nofollow is a tag that blogs can add to hyperlinks in blog comments. The tag tells Google not to use that link in calculating the PageRank for the linked site. [...]

Since its enthusiastic adoption a year and a half ago, by Google, Six Apart, Wordpress, and of course the eminent Dave Winer, I think we can all agree that nofollow has done — nothing. Comment spam? Thicker than ever. It’s had absolutely no effect on the volume of spam. That’s probably because comment spammers don’t give a crap, because the marginal cost of spamming is so low. Also, nofollow-tagged links are still links, which means that humans can still click on them — and if humans can click, there’s a chance somebody might visit the linked sites after all.

I agree. At the time, I pointed at this comment from Mark Pilgrim:

Spammers have it in their heads now that weblog comments are a vector to exploit. They don’t look at individual results and tweak their software to stop bothering individuals. They write generic software that works with millions of sites and goes after them en masse. So you would end up with just as much spam, it would just be displayed with unlinked URLs.

Spammers don’t read blogs; they just write to them.

I still think he was spot on.

However, one part of the ‘Google’s embarrassing mistake’ article is a red herring — I think the chilling effect on “nonspam links” is not to be worried about; as Jeremy Zawodny said, life’s too short to worry about dropping links purely in the hopes of giving yourself Page Rank. I don’t know if I really want links that people are leaving purely for that reason. ;)

In fact, I wouldn’t be surprised to hear that Google’s crawler starts treating “nofollow” links as mildly non-spammy in a future revision, due to their wide use in wikis, blogs etc.

To be honest, though — I don’t see the problem of blog-spam much anymore. As I said here:

[Weblog] comment spam should be a lot easier to deal with than SMTP spam. … With weblog comments, you control the protocol entirely, whereas with SMTP you’re stuck with an existing protocol and very little “wiggle room”.

On my WordPress weblog [ie. here] — which, admittedly, gets only about 1/4 of the traffic plasticbag.org does — I’ve instituted a very simple check stolen from Jeremy Zawodny. I simply include a form field which asks the comment poster for my first name, and if they fail to supply that, the comment is dropped. In addition, I’ve removed the form fields to post directly, requiring that all comments are previewed; this has the nice bonus of increasing comment quality, too.

Those are the only antispam measures I’m using there, and as a result of those two I get about 1 successful spam posted per week, which is a one-click moderation task in my email. That’s it.

The key is to not use the same measures as everyone else — if every weblog has a different set of protocols, with different form fields asking different simple questions, the only spammers that can beat that are the ones that write custom code for your site — or use human operators sitting down to an IE window.

Trackbacks, however — turn that off. The protocol was designed poorly, with insufficient thought given to its abuse potential; there’s no point keeping it around, now that it’s a spam vector.

Finally, a “perfect” solution to blog spam, while allowing comments, is unachievable. There will always be one guy who’s going to sit down at a real web browser to hand-type a comment extolling the virtues of some product or another. The goal is to get it to a level where you get one of those per week, and it’s a one-click operation to discard them.

(Update: This story got Slashdotted! The poor server’s been up and down repeatedly — looks like it needs an upgrade. In the meantime, WP-Cache has proven its weight in gold; recommended…)

Tags: , , , ,

Comments (30)

DearAOL and GoodMail

Things have really been heating up recently around the AOL/Goodmail “pay to send” CertifiedMail scheme — the EFF and a host of other groups have launched dearaol.com, stating:

This system would create a two-tiered Internet in which affluent mass emailers could pay AOL a fee that amounts to an “email tax” for every email sent, in return for a guarantee that such messages would bypass spam filters and go directly to AOL members’ inboxes. Those who did not pay the “email tax” would increasingly be left behind with unreliable service. Your customers expect that your first obligation is to deliver all of their wanted mail, and this plan is a step away from that obligation.

While I dislike this proposal, too, as far as I can tell, AOL actually have pretty reasonable intentions with this program — nowhere near as bad as the DearAOL.com site makes out.

However, they’re doing a really really crappy job of getting this information out there, or committing to reasonable limits on the program, such as announcing that they will use it only for transactional emails, as Yahoo! have done.

I’d strongly recommend reading Carl Hutzler’s posting on the subject. Carl was AOL’s head of anti-spam operations until last year, so he really knows what he’s talking about, and he lays it out clearly — a lot more clearly than any corporate statements from AOL do. His blog contains a fair bit more on the subject, too.

But seriously — why isn’t there a press release on the AOL site about this scheme? Some front-channel communication about now might be useful, I’d suggest, before things really get hairy — this crapstorm is coming about partly because AOL’s comments are all filtering out in drips and drabs via third parties, and (AOLers say) are being misconstrued and misrepresented in the process. It’s a classic case of missing the cluetrain.

I’d also really encourage the EFF people to tone done the rhetoric; statements like “senders will have no guarantee that their emails will be delivered” is scare-mongering, given that SMTP email already provides no such guarantee.

Update: wow, MoveOn went really overboard — “threatening the Internet as we know it … The very existence of online civic participation and the free Internet as we know it are under attack.” OMG the sky is falling!

Side Issue: The Spam Definition

Also, another note to EFF: defining spam as “whatever you don’t want to read” is a terrible mistake to make. That confuses a good, clear, enforceable and automatable definition of spam – unsolicited bulk email – and makes it effectively unenforceable by law, unpoliceable by ISPs, impossible to detect automatically, and incompatible with existing, effective EU and Australian legislation.

Listen to your own Chairman of the Board; he’s right on this count.

PS: any luck fixing up the non-confirmed signups issue? Last time I checked I could still subscribe any address to the EFF Action Alerts without a cross-check, which is not a good thing.

Tags: , , , , , , ,

Comments (6)

DataMation Anti-Spam Product of the Year!

Hooray!

SpamAssassin has been voted DataMation Anti-Spam Product of the Year for 2006, earning three times as many votes as the next contender.

This is the second year in a row, which is fantastic — and our margin is increasing each year. ;)

Tags: , , , , , , , , ,

Comments (1)

TREC Spam Corpus

Some news from TREC’s Gordon Cormack:

The TREC 2005 Corpus (92,000 messages – 42,000 ham; 50,000 spam) is now available for self-serve download.

TREC Spam Evaluation is a NIST program to develop methods to measure spam filter accuracy and performance. More details here.

The corpus can be picked up at Gordon’s site. As far as I can tell, this should be a pretty solid corpus for spam researchers and developers.

Tags: , , , , , , , , ,

Comments (2)

E-Pending

Boing Boing has an interesting case today:

“I filled out a web form for a contest from Miller using a throwaway junk email address and then, months after I dumped the throwaway account, I got this to my main account! Not sure I like the idea of companies tracking me down like this.”

I sent a mail to follow up on this, but it’s worth blogging here too.

This is, unfortunately, common practice among the “legitimate” bulk mailer companies; it’s called “e-pending” (short for “email address appending”). Basically, the advertiser contacts one of the big data-mining companies, provides them with the data they have about the customer — name, postal address, etc., and gets them to match that against their database; the data-miner then provides any other email addresses they may have on file for that user, even if those email addrs were provided for bills, promotional use for other companies, etc.

The advertisers contend that permission was given by the person who’s being mailed; the recipients contend that permission was given to send to a specific address, not all of that person’s addresses in perpetuity.

Here’s a few more examples of e-pending gone bad: two Jennifer Millers, Sony scraping ancient Internic contact addresses, Spamvertized.org comment on the practice, Joe St. Sauver comments.

It’s exclusively a US phenomenon, as far as I know; I think most cases of e-pending are rendered illegal under EU data protection law. Handy. ;)

Update: Brian at the Spam Kings weblog notes that ‘this spooky little spam was the work of Equifax, the big credit reporting agency that shut down its Boca Raton-based spam operation, Naviant, in 2003, due to the impending passage of CAN-SPAM.’

Tags: , , , , , ,

Comments

Spamhaus comment on the AOL/Goodmail deal

AOL and Yahoo! have been making a lot of headlines with their plans to reduce their whitelist-management workload — and make a little pay-to-send money on the side — with a deal with Goodmail.

Now Spamhaus have gone on the record against the plan:

On Monday, Richard Cox, chief information officer at antispam organization Spamhaus, said that “an e-mail charge will destroy the spirit of the Internet.”

“The Internet has become what it is because of freedom of communication. Open discussion is what gives it value. There should be no cost for particular services, and e-mail should be free and accessible to all. This will disenfranchise people.”

Tags: , , , , , ,

Comments (3)

Happy Spam-Solved Day!

Happy BillG-Scheduled Spam Solved Day!

“Two years from now, spam will be solved,” Microsoft’s Bill Gates said [at the 2004 World Economic Forum in Switzerland].

So is it? Weeeeell…..

To “solve” the problem for consumers in the short run doesn’t require eliminating spam entirely, said Ryan Hamlin, the general manager who oversees [Microsoft]’s anti-spam programs. Rather, he said, the idea is to contain it to the point that its impact on in-boxes is minor.

In that way, Hamlin said, Gates’ prediction has come true for people using the right tactics and advanced filtering technology.

Ha. I am reminded of ‘weapons of mass destruction-related program activities’.

As one slashdotter says, ‘when you fail, try try again; or conversely, change the requirements and make it look like a success, which is exactly what BG has done.’

It’s not washing, though, unsurprisingly. The poll on the same page, asks ‘do you agree with Microsoft’s contention that the spam problem has been “solved”?’ Right now, with 1169 votes, it has 7.2% (in other words, the MS employees) agreeing, and a whopping 92.8% not going for it.

Tags: , , , , , ,

Comments (5)

‘Internet Stamps’: ‘Sender Pays’ Is Back From The Dead

Jeremy Zawodny mentions that Tim Bray has proposed something he calls ‘Internet Stamps’ to solve the blog-spam problem; here’s Tim’s description of how it works:

An Internet Stamp is an assertion, signed by a Post Office, that some chunk of text was issued by someone who paid for the stamp. At least one major Post Office will be required by government statute to sell stamps to anyone in the world for either US$0.01 or EUR 0.01, and no stamp-selling organization will be recognized which sells stamps for less than this amount. For this to work, the number of stamp-selling organizations needs to be small and the organizations stable; another reason why Post Offices are plausible candidates.

It works like this: if you want to buy stamps, you sign up for an account with your Post Office; it works like paper stamps, you buy a bunch at a time in advance, in small amounts like $20 or EUR 10. Then the Post Office offers a Web Service where you connect to a port, authenticate yourself and send along some text; the Post Office decrements your account and sends back the stamp. There are a variety of digesting/signing/PKI techniques that could be applied to implement the stamps; a standard is required but should be easy.

Apparently himself and a few other guys chatted about it at the first Foo Camp, back in 2003. Funnily enough, in the anti-spam community, we were having our own chats about it, but it sounds like our paths didn’t cross for some reason…

We call this idea ’sender pays’. Earlier in 2003, in June, John Levine published what I’d consider the canonical wrap-up of why it will not work, in ‘An Overview of e-Postage’.

That report demolishes the use of ’sender pays’ for e-mail anti-spam, on three main counts:

  • Creating a transaction system large enough for e-postage would be prohibitively expensive. The nearest parallel is the credit card transaction system, which deals with 1% of the transaction volume per day, and with much larger profit margins to make it worth their while.

  • The true financial, administrative, and social costs of e-postage are completely unknown. What do you do when a ‘bad guy’ steals the e-postage stamps off Aunt Millie’s hard disk, without her knowledge? How much is the Fraud Handling Department going to cost? Is she just going to be out of luck when this happens? Will you need to use whitelisting and a content-based anti-spam filter as well, to filter out the messages sent using valid, but stolen, stamps?

  • Users hate micropayments. In short, see Andrew Odlyzko’s research.

Now, using it on weblog spam is a little more practical than e-mail spam, for one because it has a lower daily volume of transactions; but these objections still stand, in my opinion.

John Levine is one of the foremost authorities in anti-spam, and this report has been a mainstay of the anti-spam canon for two years. Anyone discussing a new anti-spam concept really ought to know this report backwards and forwards by this stage, and go into some detail as to how their proposal deals with the issues raised, if it’s to be taken seriously.

Tags: , , , ,

Comments (5)

‘I Will Eat Your Dollars’

An excellent, eye-opening interview with Samuel, an ex-419 scammer.

There’s even a theme tune:

Their anthem, “I Go Chop Your Dollars,” hugely popular in Lagos, hit the airwaves a few months ago as a CD penned by an artist called Osofia:

“419 is just a game, you are the losers, we are the winners.
White people are greedy, I can say they are greedy
White men, I will eat your dollars, will take your money and disappear.
419 is just a game, we are the masters, you are the losers.”

Reportedly, Lagos inhabitants paint “This House Is Not For Sale” in big letters on their homes, in case someone posing as the owner tries to put it on the market.

Regarding the workings of the scam:

[Samuel] sent 500 e-mails a day and usually received about seven replies. Shepherd would then take over. “When you get a reply [to a 419 spam], it’s 70% sure that you’ll get the money,” Samuel said.

(via Nelson.)

Tags: , , , , ,

Comments

DnsblAccuracy082005 – Spamassassin Wiki

Do you use anti-spam DNS blocklists? If so, you should probably go take a look at DnsblAccuracy082005 on the SpamAssassin wiki; I’ve collated the results from our recent mass-check rescoring runs for 3.1.0, to produce have up-to-date measurements of the accuracy and hit-rates for most of the big DNS blocklists.

A few highlights:

We don’t have accurate figures for the new URIBL.COM lists, btw — only the rulesets that are distributed with SpamAssassin were measured.

Tags: , , ,

Comments (1)

Bogus Challenge-Response Bounces: I’ve Had Enough

I get quite a lot of spam. For one random day last month (Aug 21st), I got 48 low-scoring spam mails (between 5 and 10 points according to SpamAssassin), and 955 high-scorers (anything over 10). I don’t know how much malware I get, since my virus filter blocks them outright, instead of delivering to a folder.

That’s all well and good, because spam and viruses are now relatively easy to filter — and if I recall correctly, they were all correctly filed, no FPs or FNs (well, I’m not sure about the malware, but fingers crossed ;).

The hard part is now ‘bogus bounces’ — the bounces from ‘good’ mail systems, responding to the forged use of my addresses as the sender of malware/spam mails. There were 306 of those, that day.

Bogus bounces are hard to filter as spam, because they’re not spam — they’re ‘bad’ traffic originating from ‘good’, but misguided, email systems. They’re not malware, either. They’re a whole new category of abusive mail traffic.

I say ‘misguided’, because a well-designed mail system shouldn’t produce these. By only performing bounce rejection with a 4xx or 5xx response as part of the SMTP transaction, when the TCP/IP connection is open between the originator and the receiving MX MTA, you avoid most of the danger of ’spamming’ a forged sender address. However, many mail systems were designed before spammers and malware writers started forging on a massive scale, and therefore haven’t fixed this yet.

I’ve been filtering these for a while using this SpamAssassin ruleset; it works reasonably well at filtering bounces in general, catching almost all of the bounces. (There is a downside, though, which is that it catches more than just bogus bounces — it also catches real bounces, those in response to mails I sent. At this stage, though, I consider that to be functionality I’m willing to lose.)

The big remaining problem is challenge-response messages.

C-R is initially attractive. If you install it, your spam load will dwindle to zero (or virtually zero) immediately — it’ll appear to be working great. What you won’t see, however, is what’s happening behind the scenes:

  • your legitimate correspondents are getting challenges, will become annoyed (or confused), and may be unwilling or unable to get themselves whitelisted;

  • spam that fakes other, innocent third party addresses as the sender, will be causing C-R challenges to be sent to innocent, uninvolved parties.

The latter is the killer. In effect, you’re creating spam, as part of your attempts to reduce your own spam load. C-R shifts the cost of spam-filtering from the recipient and their systems, to pretty much everyone else, and generates spam in the process. I’m not alone in this opinion.

That’s all just background — just establishing that we already know that C-R is abusive. But now, it’s time for the next step for me — I’ve had enough.

I initially didn’t mind the bogus-bounce C-R challenges too much, but the levels have increased. Each day, I’m now getting a good 10 or so C-R challenges in response to mails I didn’t send. Worse, these are the ones that get past the SpamAssassin ruleset I’ve written to block them, since they don’t include an easy-to-filter signature signifying that they’re C-R messages, such as Earthlink’s ’spamblocker-challenge’ SMTP sender address or UOL’s ‘AntiSpam UOL’ From address. There seems to be hundreds of half-assed homegrown C-R filters out there!

So now, when I get challenge-response messages in response to spam which forges one of my addresses as the ‘From’ address, and it doesn’t get blocked by the ruleset, I’m going to jump through their hoops so the spam is delivered to the C-R-protected recipient. Consider it a form of protest; creating spam, in order to keep youself spam-free, is simply not acceptable, and I’ve had enough.

And if you’re using one of these C-R filters — get a real spam filter. Sure they cost a bit of CPU time — but they work, without pestering innocent third parties in the process.

Tags: , , ,

Comments (8)

Emergent Chaos: I’m a Spamateur

Emergent Chaos: I’m a Spamateur:

In private email to Justin “SpamAssassin” Mason, I commented about blog spam and “how to fix it,” then realized that my comments were really dumb. In realizing my stupidity, I termed the word “spamateur,” which is henceforth defined as someone inexperienced enough to think that any simple solution has a hope of fixing the problem.

I think this is my new favourite spam neologism ;)

Tags: , , , ,

Comments (1)

The Life of a SpamAssassin Rule

Spam: during a recent discussion on the SpamAssassin dev list, the question came up as to how long a rule could expect to maintain its effectiveness once it was public — the rule secrecy issue.

In order to make a point — that certain types of very successful rules can indeed last a long time — I picked out one rule, MIME_BOUND_DD_DIGITS. Here’s a smartened-up copy of what I found out.

This rule matches a certain format of MIME boundary, one observed in 17.4637% of our spam collection and with 0 nonspam hits. Since we have a massive collection of mails, received between Jan 2004 to May 2005, and a rule with a known history, we can then graph its effectiveness over time.

The rule’s history was:

  • bug 3396: the initial contribution from Bob Menschel, May 15 2004
  • r10692: arrived in SVN: May 16 2004
  • r20178: promoted to ‘MIME_BOUND_DD_DIGITS’: May 20 2004 (funnily enough, with a note speculating about its lifetime from felicity!)
  • released in the SpamAssassin 3.0.0 release: mid-Sep 2004

So, we would expect to see a drop in its effectiveness against spam in late May 2004 and onwards, if the spammers were reacting to SVN changes; or post September 2004, if they react to what’s released.

By graphing the number of hits on mails within each 2-hour window, we can get a good idea of its effectiveness over time:

The red bars are total spam mails in each time period; green bars, the number of spam mails that hit the rule in each period. May 15 2004 and Sep 20 2004 are marked; Jan 2004 is at the left, and May 2005 is at the right-most extreme of the graph. (There’s a massive spike in spam volume at the right — I think this is Sober.Q output, which disappears after a week or so.)

It appears that the rule remains about even in effectiveness in the 4 months it’s in SVN, but unreleased; it declines a little more after it makes it into a SpamAssassin release. However, it trails off very slowly — even in May 2005, it’s still hitting a good portion of spam.

Given this, I suspect that most spammers are not changing structural aspects of their spam in response to SpamAssassin with any particular alacrity, or at least are not capable of doing so.

To speculate on the latter, I think many spammers are using pirated copies of the spamware apps, so cannot get their hands on updated versions through ‘legitimate’ channels.

Speculating on the former — in my opinion there’s a very good chance that SpamAssassin just isn’t a particular big target for them to evade, compared to the juicy pool of gullible targets behind AOL’s filters, for example. ;)

Tags: , , , , , , , , ,

Comments (3)

CEAS

Spam: back from CEAS. The schedule with links to full papers is up, so anyone can go along and check ‘em out, if you’re curious.

Overall, it was pretty good — not as good as last year’s, but still pretty worthwhile. I didn’t find any of the talks to be quite up to the standards of last year’s TCP damping or Chung-Kwei papers; but the ‘hallway track’ was unbeatable ;)

Here’s my notes:

AOL’s introductory talk had some good figures; a Pew study reported that 41% of people check email first thing in morning, 40% have checked in the middle of the night, and 26% don’t go more than 2-3 days without checking mail. It also noted that URLs spimmed (spammed via IM) are not the same as URLs spammed — but the obfuscation techniques are the same; and they’re using 2 learning databases, per-user and global, and the ‘Report as Spam’ button feeds both.

Experiences with Greylisting: John Levine’s talk had some useful data — there are still senders that treat a 4xx SMTP response (temp fail) as 5xx (permanent fail), particularly after end of the DATA phase of the transaction, such as an ‘old version of Lotus Notes’; and there are some legit senders, such as Kodak’s mail-out systems, which regenerate the body in full on each send, even after a temp fail, so the body will look different. He found that less than 4% of real mail from real MTAs is delayed, and overall, 17% of his mail traffic was temp-failed. The 4% of nonspam that was delayed was delayed with peaks at 400 and 900 seconds between first tempfail and eventual delivery.

As usual, there were a variety of ‘antispam via social networks’ talks – there always are. Richard Clayton had a great point about all that: paraphrasing, I trust my friends and relatives on some things, and they are in my social networks — but I don’t trust their judgement of what is and is not spam. (If you’ve ever talked to your mother about how she always considers mails from Amazon to be spam, you’ll know what he means.)

Combating Spam through Legislation: A Comparative Analysis of US and European Approaches:
the EU ‘opt-in’ directive is now transposed everywhere in the EU; EU citizens who are spammed by a citizen from another EU country, the reports should be sent to the antispam authority in the sender’s country; and there’s something called ‘ECNSA’, an EU contact network of spam authorities, which sounds interesting (although ungoogleable).

Searching For John Doe: Finding Spammers and Phishers: MS’ antispam attorney, Aaron Kornblum, had a good talk discussing their recent court cases. Notably, he found one cases where an Austrian domain owner had set up a redirector site which sounded like it was expressly set up for spam use — news to me (and worrying).

A Game Theoretic Model of Spam E-Mailing: Ion Androutsopoulos gave a very interesting talk on a game theoretic approach to anti-spam — it was a little too complex for the time allotted, but I’d say the paper is worth a read.

Understanding How Spammers Steal Your E-Mail Address: An Analysis of the First Six Months of Data from Project Honey Pot: Matthew Prince of Project Honeypot had some excellent data in this talk; recommended. He’s found that there’s an exponential relationship between google Page Rank and spam received at scraped addresses, which matches with my theory of how scrapers work; and that only 3.2% of address-harvesting IPs are in proxy/zombie lists compared to 14% of spam SMTP delivery IPs. (BTW, my theory is that address scraping generally uses Google search results as a seed, which explains the former.)

Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs): this presented some great demonstrations of how a neural network can be used to solve HIPs (aka CAPTCHAs) automatically. However, I’m unsure how useful this data is, given that the NN required 90000 training characters to achieve the accuracy levels noted in the paper; unless the attacker has access to their own copy of the HIP implementation they can run themselves, they’d have to spend months performing HIPs to train it, before an attack is viable.

Throttling Outgoing SPAM for Webmail Services: cites Goodman in ACM E-Commerce 2004 as saying that ESP webmail services are a ’substantial source of spam’, which was news to me! (less than 1% of spam corpora, I’d guess). It then discusses requiring the submitter of email via an ESP webmail system to perform a hashcash-style proof-of-work before their message is delivered. By using a Bayesian spam filter to classify submitted messages, the ESP can cause spammers to perform more work than non-spammers, thereby reducing their throughput. Didn’t strike me as particularly useful — Yahoo!’s Miles Libbey got right to the heart of the matter, asking if they’d considered a situation where spammers have access to more than one computer; they had not. A better paper for this situation would be Alan Judge’s USENIX LISA 2003 one which discusses more industry-standard rate-limiting techniques.

SMTP Path Analysis: IBM Research’s anti-spam team discuss something very similar to several techniques used in SpamAssassin; our versions have been around for a while, such as the auto-whitelist (which tracks the submitter’s IP address rounded to the nearest /16 boundary), since 2001 or 2002, and the Bayes tweaks we added from bug 2384, back in 2003.

Naive Bayes Spam Filtering Using Word-Position-Based Attributes: an interesting tweak to Bayesian classification using a ‘distance from start’ metric for the tokens in a message. Worth trying out for Bayesian-style filters, I think.

Good Word Attacks on Statistical Spam Filters: not so exciting. A bit of a rehash of several other papers — jgc’s talk at the MIT conference on attacking a Bayesian-style spam filter, the previous year’s CEAS paper on using a selection of good words from the SpamBayes guys, and it entirely missed something we found in our own tech report — that effective attacks will result in poisoned training data, with a significant bias towards false positives. In my opinion, the latter is a big issue that needs more investigation.

Stopping Outgoing Spam by Examining Incoming Server Logs: Richard Clayton’s talk. Well worth a read. It’s an interesting technique for ISPs — detecting outgoing spam by monitoring hits to your MX from your own dialup pools which uses known ratware patterns.

Tags: , , , , , , , , ,

Comments

CEAS coming up soon…

Spam: if you work in anti-spam, especially in filtering, or even just in working with email in general, it’s well worth going to CEAS 2005, the Conference on Email and Anti-Spam, on Thursday July 21st and Friday 22nd in Stanford:

The organizers of the Conference on Email and Anti-Spam invite you to participate in its second annual meeting. This forum brings together academic and industrial researchers to present new work in all aspects of email, messaging and spam — with papers this year covering fields as diverse as text classification, clustering and visualization of email, social network analysis applied to both email and spam, spam filtering methods including text classification and systems approaches, game theory, data analysis, Human Interactive Proofs, and legal studies, among others. The conference will feature 26 paper presentations, a banquet, and two invited speakers. See http://www.ceas.cc for details of the current program, as well as on-line registration.

Registration runs out on July 10th.

I went last year, and it was excellent — several very interesting papers were presented. I’m going this year, too, along with quite a few SpamAssassin committers, and I’m looking forward to it.

Tags: , , , , , , , , ,

Comments

Bayesian learning animation

Spam: via John Graham-Cumming’s excellent anti-spam newsletter this month, comes a very cool animation of the dbacl Bayesian anti-spam filter being trained to classify a mail corpus. Here’s the animation:

And Laird’s explanation:

dbacl computes two scores for each document, a ham score and a spam score. Technically, each score is a kind of distance, and the best category for a document is the lowest scoring one. One way to define the spamminess is to take the numerical difference of these scores.

Each point in the picture is one document, with the ham score on the x-axis and the spam score on the y-axis. If a point falls on the diagonal y=x, then its scores are identical and both categories are equally likely. If the point is below the diagonal, then the classifier must mark it as spam, and above the diagonal it marks it as ham.

The points are colour coded. When a document is learned we draw a square (blue for ham, red for spam). The picture shows the current scores of both the training documents, and the as yet unknown documents in the SA corpus. The unknown documents are either cyan (we know it’s ham but the classifier doesn’t), magenta (spam), or black. Black means that at the current state of learning, the document would be misclassified, because it falls on the wrong side of the diagonal. We don’t distinguish the types of errors. Only we know the point is black, the classifier doesn’t.

At time zero, when nothing has been learned, all the points are on the diagonal, because the two categories are symmetric.

Over time, the points move because the classifier’s probabilities change a little every time training occurs, and the clouds of points give an overall picture of what dbacl thinks of the unknown points. Of course, the more documents are learned, the fewer unknown points are left.

This is an excellent visualisation of the process, and demonstrates nicely what happens when you train a Bayesian spam-filter. You can clearly see the ‘unsure’ classifications becoming more reliable as the training corpus size increases. Very nice work!

It’s interesting to note the effects of an unbalanced corpus early on; a lot of spam training and little ham training results in a noticeable bias towards the classifier returning a spam classification.

Tags: , , , , , , , , ,

Comments

Lexis-Nexis hacked through spam

Spam: WashPost: Computers Seized in Data-Theft Probe:

According to an account provided by the teenaged member of the hacker group — and confirmed by the law enforcement source who insisted on anonymity — the LexisNexis break-in was set in motion by a blast of junk e-mail. Sometime in February a small group of hackers … sent out hundreds of e-mails with a message urging recipients to open an attached file to view pornographic child images. The attachments had nothing to do with child porn; rather, the files harbored a virus (sic) that allowed the group’s members to record anything a recipient typed on his or her computer keyboard.

According to the teenage source, a police officer in Florida was among those who opened the infected e-mail message. Not long after his computer was infected with the keystroke-capturing virus, the officer logged on to his police department’s account at Accurint, a LexisNexis service provided by Florida-based subsidiary Seisint Inc. …

The young hacker said the group members then created a series of sub-accounts using the police department’s name and billing information. Over several days, the hacker said the group looked up thousands of names in the database, including friends and celebrities. The law enforcement source said the group eventually began selling Social Security numbers and other sensitive consumer information to a ring of identity thieves in California.

Tags: , , , , , , , , , ,

Comments

UBE, not UCE

Spam: About this time last year, German neo-nazis launched a massive worldwide spam run with the aid of the Sober.H worm.

Well, it looks like they’re planning to make this a regular occurrence, because it’s on again, spamming nazi opinions linking to stories on reputable news sites, as well as pages on less reputable right-wing sites, Joe Wein has posted some samples. I’ve already received nearly a thousand since last night.

The good news — here’s a SpamAssassin ruleset that catches these nicely. thanks Raymond!

Tags: , , , , , , , , ,

Comments

Congressional Open URL Redirectors

Spam: Matthew Wilson at Boomer Consulting has been having a field day — it looks like some smart google hacking has thrown up some doozies of places that should have fixed this by now:

and my favourites:

Of course, all of these are immaterial to SpamAssassin — we catch spammers using them anyway. But still, a surprising number of these out there.

Tags: , , , , , , , , , ,

Comments

Spam and Broken Windows, and wecanstopspam.org

Spam: Spam Chongqing: Spamming Experiment:

Kasia at unix-girl.com decided to run a spamming experiment on her blog. She posted a couple spams to her own blog and waited to see what would happen. In less than 24 hours she received 356 more spams.

The chongqing guys confirm this, and I’ve noticed this as well (although just in passing, I’ve never tried testing it).

Interestingly, I’m pretty sure the same thing can happen with mailing lists, if the mailing list archives are allowed to contain the mailing list’s posting address, and the list allows open posting. It works like this:

  • spammer A posts a spam to the list
  • spam is archived
  • google finds archived spam
  • list-builders B, C, D google for search terms, find archive page for that mail message
  • B, C, D scrape the addresses from that page and pick up the list posting address
  • they then either sell on to spammers E, F, and G, who spam that address, or they spam the address themselves
  • and redo loop from the start.

One key factor is the search terms B, C, and D use. My theory is that they are intending to generate ‘targeted’ lists, and in spamming, most targeted lists are simply lists of addresses scraped from pages that show up in a google search for a specific keyword — ‘meds’, ‘viagra’, ‘degree’, etc.

Joe at chonqing surmises that it may be through the Broken Windows Theory — that spam appearing in a weblog’s comments, or in a wiki page, indicates that the administrator is asleep at the wheel and more spam can be posted with impunity. in my opinion, that’s probably more likely for google-spam and wiki-spam than for email spam, but undoubtedly is a factor.

PS: href=”http://chongq.blogspot.com/2005/04/another-spammer-owned-antispam-site.html”> wecanstopspam.org has been allowed to lapse and has been stolen by a spammer. Oh dear.

Tags: , , , , , , , , ,

Comments

A highlight (or low-light) from the world of spam bounces

Spam: recently, I’ve been getting a lot of spam bounces; that is, messages sent by people’s autoresponders, in response to forged spam claiming to come from my domain. (I have an SPF record, but these autoresponders naturally don’t bother to check that before replying.)

I have a SpamAssassin ruleset which catches these, and it gets rid of the vast majority — but the odd wierd one gets past. This one caught my eye before I deleted it:

On October 5, 2004, I will be going to the Illinois Department of Corrections for approximately 18 months. If you wish to contact me, please snail mail me at: (address deleted)
Your letters will be forwarded to me and I will reply as soon as I receive them! Thanks…and please do write! Mail is vitally important! :-)

… ouch. Good luck to this guy, whoever he is…

Tags: , , , , , , , , ,

Comments

Spamhaus article on ISPs hosting spam gangs

Spam: Should ISPs Be Profiting From Knowingly Hosting Spam Gangs? – a new article up on Spamhaus.org, well worth a read. Some snippets:

So where is this stealth proxy spamware sold and distributed from? For Send Safe the answer is, www.send-safe.com, hosted by MCI Worldcom.

… MCI executives have refused to stop providing service to these gangs, insisting that the sale and distribution of stealth spamming software is not against MCI’s policy.

… It’s no surprise therefore that MCI has consistently occupied first place in Spamhaus TOP 10 World Worst Spam Service ISPs chart, with over 200 spammers and spam gangs on the MCI network in full knowledge of the security managers and the General Counsel.

… MCI Worldcom’s official position on the issue is that MCI can’t stop their spam gangs selling proxy hijacking spamware from MCI’s network as that would be ‘censoring’ the distribution and sale of illegal proxy hijacking software.

Tags: , , , , , , , , ,

Comments

‘Spam Kings’ review

Spam: Before xmas, I received a copy of Brian McWilliams‘ new book, Spam Kings.

It’s a great book — full of behind-the-scenes details on how the spammers operate, how they get away with it on the sending end, how they try to evade filters on the receiving end, and how they’re fundamentally running the usual simple scams that have been around since before email spam came into existence. Well worth reading.

In addition, Brian’s continuing to write about spam and spammers at the Spam Kings weblog, and will be giving a talk at this year’s MIT Spam Conference, tomorrow.

Anyway, pick up a copy if you’re interested in the spam problem — this is one of the best books I’ve read on the subject, and this kind of information is essential for an understanding of the people we’re up against.

Tags: , , , , , , , ,

Comments

Echo chamber goes crazy about ‘nofollow’

Blogs: Just to expand on a linkblog posting I made yesterday, Google’s search team have announced support for a new piece of Google functionality; they’ll fix their crawlers to ignore links with a rel="nofollow" attribute, for PageRank calculations, the idea being that spammers will stop blog-spamming once they can’t get PageRank out of it.

The blog world has been all aflutter:

BurningBird is right, to a degree. In fact, it’s been solved before.

Here’s a taint.org posting from November 2003 where I point out that by using a trivial Javascript URL one can link to another page without conferring PageRank. The format is:

javascript:document.location=target

The result looks like this, and work in any browser with a basic JS engine, from IE 3.02 and Netscape Navigator 2 onwards. I’ve been using it for my referrer logs, among other things, for over a year. I wrote a patch that implemented it for external links in the Moin Moin wiki software.

Amazingly, despite my plugging this idea at virtually every opportunity, it seems nobody noticed! At least, nobody among the people who (it would seem) should be looking into comment spam, thinking about how to deal with it, etc.

Disappointing — the echo chamber keeps talking to itself, once again. Maybe I’ll stick with dealing with email spam instead ;)

Ah, whatever. Anyway, this is a nicer fix; relying on JS isn’t a good thing. So nice work, Google.

(PS: worth noting that while this is a good plan, comment spam won’t be going away any time soon, as Mark Pilgrim noted. Still, here’s hoping it’ll help in the long term…)

Tags: , , , , , , , , , ,

Comments

Prescient tsunami spam

Spam: I was just looking back through the archives here on taint.org, and noticed this entry from December 2 last year:

A huge 300 ft. high ocean wave is moving towards your continent. Your and many other cities are in a real danger. Approximate wave moving speed is 700 km/h. cmoym eaaa yypbzz

Please read more about this catastrophe here: (link)

We are strongly urging you to evacuate yourself and your family as soon as possible, even though you may live far away from your city. The tsunami will reach the continent in approximately FOUR hours.

It appears that the spam was a phish attack — the site in question is full of Internet Exploder exploits. It was ‘targeted’, at least as well as such things ever are, at Australian readers. AUSCERT issued a warning about it at the time.

But how’s about that for timing? Spooky! What did those phishers know?

Tags: , , , , , , , , , ,

Comments

eWeek’s ‘Spammers Upending DNS’ article

Spam: eWeek recently published an article entitled ‘Spammers’ New Tactic Upends DNS’ , which notes that:

One .. technique finding favor with spammers involves sending mass mailings in the middle of the night from a domain that has not yet been registered. After the mailings go out, the spammer registers the domain early the next morning.

By doing this, spammers hope to avoid stiff CAN-SPAM fines through minimal exposure and visibility with a given domain. The ruse, they hope, makes them more difficult to find and prosecute.

The scheme, however, has unintended consequences of its own. During the interval between mailing and registration, the SMTP servers on the recipients’ networks attempt Domain Name System look-ups on the nonexistent domain, causing delays and timeouts on the DNS servers and backups in SMTP message queues.

This had me stumped when I read it, since an email from a nonexistent domain is a pretty reliable spamsign (it’s used in the NO_DNS_FOR_FROM rule in SpamAssassin, for example, which hits about 2% of spam), has been a rule in the default ruleset for several years, and there’s no sign of that behaviour in our spam traps.

After some discussion, Suresh Ramasubramanian came up with this explanation of what’s really happening:

Verisign now allows immediate (well, within about 10 minutes) updates of .com/.net zones (also same for .biz) while whois data is still updated once or twice a day. That means if spammer registers (a) new domain he’ll be able to use it immediatly (sic) and it’ll not yet show up in whois (and so not be immediatly identifiable to spam reporting tools) - and spammers are in fact using this “feature” more and more!

That does sound a much more likely explanation, and matches what’s been seen in the traps.

So: WHOIS, not DNS.

Tags: , , , , , , , , ,

Comments

Back, in the flurry of a mini-tornado

Meta: Back. Not even ‘mini-tornados’ at Dublin Airport can keep me away — although it gave it a damn good try, with a 3 hour delay, a missed connection, and an overnight stay in Chicago. Arggh.

Mail: I generally leave the laptop at home when on vacation, to do some proper winding down. Not sure it was a great idea this time, since I was joe-jobbed by some pretty extensive spam runs recently, resulting in over 30,000 bounces sitting unread in my email when I got back.

Thankfully, Tim Jackson’s bogus-virus-warnings.cf SpamAssassin ruleset (with a few updates) got most of them, with only a few hundred getting past. I should really hack on making those more complete, but some of the bounces are really obscure; along the lines of ‘Hi from J Random Luser, Esq.! I no longer use this address because it gets too much spam! Please send to this new one instead: jrluser98@example.com!’, generally without any obvious identifying headers that indicate it’s an autoresponse.

Sigh — each of those messages is just utterly random, and I can’t see much recourse but to come up with some nasty phrase-based content filtering rules, which I was hoping to avoid. But 29,500 hits isn’t bad ;)

I’m not sure they’d be suitable yet for use as default SpamAssassin rules, since they now generally just match any kind of bounce message, not specifically joe-job or virus-forgery blowback. But that suits me just fine — I can live without bounces, as long as I don’t have to suffer the bounce blow-back.

Science: Good news from New Scientist — they’re opening up their archives! NS has consistently the best science journalism around, and I’ve been a subscriber for years. But until recently, they had a lousy approach to their website — most of the useful stuff, like the archives, were walled-off, subscriber-only features; a classic case of missing the Clue Train. Well, here’s an archive search for ’spam’ — pretty impressive, and most of the short articles are available in full, with only the full text for features and opinion pieces requiring a login.

In addition, they’ve added a massive batch of RSS feeds. Sadly, no full article text excerpts, however. But still — getting the clue, eventually — this way they may actually get links on the web, in place of the mangled and chinese-whispered versions of their articles republished in the UK newspapers…

Ireland: Due to monopolistic pricing of Irish GIS data, consumer GPS maps of Ireland’s road system are appalling, and this page collects a few great demos — for example, MS Autoroute quintuples the distance from Galway to Roundstone! That’s a major tourist route, BTW. I knew it was bad, but not that bad…

Anyway, I’m still waaay behind, but slowly catching up.

Tags: , , , , , , , , , ,

Comments

BSA’s Spam Statistics

Spam: The Business Software Alliance, a UK anti-piracy body representing many of the major software vendors, recently issued a spam-related press release which got a lot of attention in the UK press (they have great press contacts!).

To quote John Graham-Cumming’s newsletter on the subject:

1 in 5 British Consumers Buy Software from Spam: that’s according to a survey by the Business Software Alliance. I find that a pretty surprisingly high number and considering it comes from an advocacy group that tries to get people to buy legitimate copies of software I expect it’s not totally accurate. The one thing I find really surprising from the survey are these two statistics: 23% of spam is read by the person receiving it and 22% of people have bought software. Apparently, 11% of people surveyed like the idea of buying through spam because the software is cheaper.

It’s still an interesting figure, but the BSA has come up with some pretty suspect statistics in the past, so pinch of salt applies. As jgc points out, the BSA have a vested interest in making the problem sound worse than it may be in reality.

Still, the survey PDF can be read here, and is worth a look.

Tags: , , , , , , , , ,

Comments

playing around with Google Suggest

Web: Google Suggest, a drop-down list of suggestions — with hitrates! The one letter hits are interesting, too.

“spam” hitrates, the top 3 (aside from “spam” itself):

  • “spam filter”: 6,400,000 results
  • “spamcop”: 1,570,000
  • “spamassassin”: 1,350,000

in the top 3. getting there!

unfortunately, you have to get as far as “justin ma” before my name shows up, so not doing too great in that competition. ;)

Tags: , , , , , , , , , ,

Comments

Interesting/bizarre recent spam

Spam: some good crazy spam recently — firstly, some Seventh Day Adventist lunacy:

THE PAPACY IS THE ANTICHRIST THAT IS TRYING TO CHANGE THE LAW OF GOD. DANIEL 7:25

THIS IS THE LAST WARNING.
THE LAW OF GOD IS ETERNAL BECAUSE GOD IS ETERNAL 14:12. MT. 5:17 SATURDAY SEVENTH DAY IS THE TRUE LORD’S DAY. EXO. 20.8-11 SUNDAY IS A FALSE PAGAN DAY. IT IS NOT IN THE BIBLE. IT WAS USED TO WORSHIP SATAN

It runs on in that vein for quite a while. Interestingly, most of the text from there on in is ‘gappy’ — in other words, the spammer has inserted spaces between each character of a word — even inside link addresses. As a result, they no longer work. oops!

And a new one to me — natural-disaster spam (via Mark Pilkington):

THIS IS AN OFFICIAL WARNING!
fngva uvtt chloez

A huge 300 ft. high ocean wave is moving towards your continent. Your and many other cities are in a real danger.
Approximate wave moving speed is 700 km/h.
cmoym eaaa yypbzz

Please read more about this catastrophe here: (link)

We are strongly urging you to evacuate yourself and your family as soon as possible,
even though you may live far away from your city. The tsunami will reach the continent in approximately FOUR hours.

venbz nwvw exepmi
YOU HAVE BEEN WARNED!

I’ve removed the link, btw — the site it links to contains a bunch of nasty malware-installing IE-bug exploits. In case you were wondering: you can tell it’s genuine because it says IT’S AN OFFICIAL WARNING at the top.

(ObSpamComment: note — this here’s a good example of why spam is unsolicited bulk email, not unsolicited commercial email; neither are selling anything. one’s religious craziness, the other one’s trying to r00t your machine.)

Tags: , , , , , , , , ,

Comments

EFF’s clueless spam filtering white paper

Spam: The EFF are a great organisation — damn, I even helped set up an organisation based on its goals in Ireland, back in the day! But this white paper is shockingly clueless.

(Note: this posting has been updated. Original left intact, but there’s an update below worth noting.)

For example:

Spam Assassin, a popular program that does ad hoc pattern matching, assigns ‘points’ to various features of an email to determine whether it is spam. … One of the major problems with this system is that messages from certain countries — like China, for example — can be blocked purely on the basis of where they come from and what language they’re in. The implications for free speech here are very troubling indeed: … thus anti-spam technology unintentionally works as a political censorship mechanism.

SpamAssassin does not give points for country of origin, or language the message arrives in, unless the user explicitly either (a) adds rules from an external source, or (b) modifies the ‘ok_languages’ setting in their configuration, from the default, to specify that they do not want to receive messages in particular languages. No country- or language-blocking happens by default. This is by design.

It’s a shame that the authors felt the need to outright fabricate a danger, here.

The white paper features more broad generalisations about ’spam filters’, mostly using unsubstantiated friend-of-a-friend stories, without detailed data. And I do know that there have been cases of MoveOn.org, at least, being a source of UBE, in the past — so it’s not valid to claim that this is all a ‘free speech’ issue; political UBE is still spam.

They need to realise there’s a lot of very smart, very reasonable anti-spammers out there, and most of us agree with the rest of their goals, except for their spam position. This is hurting them.

Still, it appears they’re finally getting a clue about requiring subscription requests be confirmed using closed-loop opt-in, so that’s good. More political newsletters, and political campaigns, need to get this clue — just because it’s political speech does not mean it’s not spam. (I have several thousand political spams in my spam folder — most from that German anti-immigration virus from earlier this year.)

Note that Rod is unsure if they’re practicing what they preach…

Update: Annalee Newitz has been in touch, and pointed out that the white paper in fact says ‘mails … can be blocked’, rather than ‘are blocked’ based on country of origin. In other words, it’s purely a matter of this being possible, rather than the default, and that administrators apply these customisations.

In addition, she notes that the conclusions recommend that ISPs and administrators of spam blocking systems allow end users to control their own filtering settings, saying ‘If a user wants to block all mail from China, great. If a sysadmin does it for a bunch of users without permission, then that is a problem in our opinion.’

So I agree with that. Misdirected outrage hereby turned off ;)

(Mind you, I still think they need to work more with the reasonable anti-spammers… and fix that unconfirmed sign-up that Rod mentioned, if it’s really still unconfirmed!)

Tags: , , , , , , , , ,

Comments

‘Stubberfield’ falls victim to first felony anti-spam conviction

Spam: 2 found guilty in first felony spam conviction: ‘LEESBURG, Va. – A brother and sister who sent unsolicited junk e-mail to millions of America Online customers were convicted Wednesday in the nation’s first felony prosecution of distributors of spam.’

Jeremy D. Jaynes, 30, (aka. Gaven Stubberfield) and Jessica DeGroot, 28, convicted to nine years in prison and a $7,500 fine respectively.

Nine years — wow, that’s a serious conviction for spamming… Virginia clearly takes this very seriously, as the home of AOL. Let’s see if this causes any of the remaining spammers to think twice.

Tags: , , , , , , , , ,

Comments

Slides from Toorcon 2004

Spam: my slides from the presentation I gave at Toorcon 2004, ‘Spam Forensics: Reverse-Engineering Spammer Tactics’, are now up. Hope they prove enlightening ;)

Tags: , , , , , , , ,

Comments

SpamAssassin 3.0.0 Released!

Spam: SpamAssassin 3.0.0 is now released! w00t! Only 4 months late this time ;) Announcement, techie details, Slashdot. New logo too:

(Note: if you’re running SpamAssassin 2.x and plan to upgrade, this is a new major release cycle — so we’ve taken the chance to break some backwards compatibility. Be sure to read the UPGRADE doc!)

Tags: , , , , , , , , , ,

Comments

ToorCon

Conferences: Hey — I’m talking at ToorCon 2004 down in San Diego this weekend! Come along and check it out, if you can.

I’d better hurry up and file my presentation slides pronto ;) The topic is:

Spam Forensics: Reverse-Engineering Spammer Tactics

In this talk, I’ll discuss how the SpamAssassin project has identified reliable signatures indicating that a message is spam, by reverse-engineering spammer tactics from the spam mails themselves. I’ll also discuss several specific features that we have identified, how we found them, and why the spammers add them.

Tags: , , , , , , , , , ,

Comments

Open source v closed-source spam filtering

Spam: I’m quoted in
New Scientist! w00t!

SlashDot picked it up pretty quickly. One comment there misses the point, though:

This is interesting and promising technology. But like all antispam techniques, spammers will find a way around it. Once spammers get a copy of the software, they can create and test countermeasures in the comfort of their own sleazy lairs.

It’s worth talking about this. Newsflash: spammers have no difficulty testing their spam against closed-source spam filters, even when they can’t ‘get a copy’ and test them in ‘their sleazy lairs’.

How do they do it? Easy — just set up an account at a site that uses that filter (AOL, Yahoo!, Hotmail, and GMail, it’s pretty obvious how to do that; for other closed-source filters, find an ISP that uses it). Then send ‘test mails’ repeatedly to that account, and apply trial and error to see what gets past the filter and what doesn’t. Eventually, they figure out what works for that filter, and what doesn’t.

How did I figure this out? Well, I came across the manual for the Send-Safe ratware on-line. It noted that the ‘hashbuster’ randomisation technique, which we in the SpamAssassin team had long assumed was intended to block hash matches by DCC, Pyzor and Razor, was in fact intended to block AOL’s implementation of that system. The open source ones weren’t even mentioned.

Update: found it — from their FAQ:

Mime Encoded content

If you want to get into AOL… use it.

MIME encoders allow you to send documents written within a specific application through email without causing readability or formatting problems. For example, you can send a letter created in MSWord with and be certain that it arrives at its destination in the same format by encoding it with MIME first. The recipient then decodes it back into the original MSWord format.

That isn’t why we use it though.

We use it to cause ‘uniqueness’.

When you put a rotate tag at the beginning of a MIME encoded email, it causes everything after that point (including checksums) to be ‘different’ in every message.

Why is that that important?

Because it throws off filters that look for many copies of the same message to nuke.

Tags: , , , , , , , , ,

Comments

A ‘Boulder Pledge scoreboard’ website

Spam: Ask Slashdot: How Powerful is the Turn-Off Power of Spam? The question is, ‘How often do you make the decision to NOT buy something form a company because you know they engage in spamming activities?’

This is an old idea — it goes back to a December 1996 column by Roger Ebert, of all people, who proposes the following pledge that all internet users should take:

Under no circumstances will I ever purchase anything offered to me as the result of an unsolicited e-mail message. Nor will I forward chain letters, petitions, mass mailings, or virus warnings to large numbers of others. This is my contribution to the survival of the online community.

8 years later, it’s more important than ever.

However, it’s complicated by one additional factor — not everyone knows which products and companies use spam to advertise. For example, did you know that Kraft routinely advertise their Gevalia coffee through spam?

My suggestion — a daring individual (that rules me out ;) should set up a website where samples of major-product-advertising spam are collected from (trusted) reporters. A quick scoreboard based on how many reports a particular company accumulates, and we have a Boulder Pledge reputation service.

Some simple rules should be applied:

  • Messages arriving at never-used spamtrap addresses, or scraped addresses from USENET or the web, especially if the message hits multiple of those addresses (indicating a high volume), is the basis for a listing;
  • Failure to respect opt-outs, of course, would be a biggie;
  • Using a known spamhaus, or sending via open proxies in Shandong, would be a massive thumbs-down;
  • Failure to clean up it’s act after being made aware of the problem, oh dear.

It’d be essential to take an extremely careful approach to this; any hint of personal axe-grinding, and the site would be useless, written off as just the work of ‘another anti-spam kook’.

Essentially, this’d be a Fortune-500-oriented version of spamvertized.org.

Reportedly, many of the large companies using spam to advertise are fully aware at a management level that they are responsible for spamming. (That line about open proxies in Shandong is no joke — at least one Fortune 500 company has hired a spamhaus that does this.)

Doubtless, some spamvertisers may be victim to an overzealous but clueless marketing department, on the other hand — but either way, a public ‘name and shame’ forum gives a great impetus for them to avoid this problem, at least once they’ve been bitten the first time.

In some cases, it’s dodgy ‘affiliates’ that use spam to advertise their products — but a company that operates affiliates really should post a policy that says that affiliates found to be spamming will be terminated and have their commissions forfeited; reportedly, that has been found in other programs to quickly cut off the problem.

Tags: , , , , , , , , ,

Comments

Spamusement rocks!

Spam: oh man, Spamusement started off well, and has just been getting better and better; * HEATH WARNING * had me laughing out loud, and the idea of linking the entries since August 8 as a series is genius.

Tags: , , , , , ,

Comments

CEAS Roundup

Spam: So, CEAS was great fun, and very educational:

  • Got to meet up with various antispammers, including Daniel and Theo from the SpamAssassin dev team, Jeff Chan from SURBL, Dan Kohn from Habeas, Catherine Hampton from The SpamBouncer, Miles Libbey, John Levine, Neil Schwartzman — lots of good chats.
  • MS really know how to feed a conference! I hear rumours there was an extra-special tinned-meat-product-based dish at the banquet…
  • But their firewalling tendencies put a serious damper on keeping in touch with the outside world, at least until we set up an SSH tunnel on port 443 ;)
  • During a lull, Dan Kohn fired off a hands-up census — a good 75% of the attendees (roughly) admitted to using SpamAssassin!

My highlight papers:

  • IBM’s Chung-Kwei pattern-discovery system — the one which Mark dug up. Very interesting stuff; it turns out that bioinformatics is full of large corpora of data (genomes) which you then need to find patterns in. Funnily enough, so is SpamAssassin: s/genomes/spam/, s/patterns/regular expressions/. The more advanced pattern-discovery algorithms even allow complex patterns to contain alternative blocks, ‘don’t-cares’ and similar regular-expression-like features.

    The really good bit of Chung-Kwei is the Teiresias algorithm (more pages, online demo). Of course, being IBM research, it’s probably patented to the hilt, and may be tricky to license; but it’s certainly pointed us in a whole new interesting direction — anyone know any bioinformaticians?

    IBM is really gearing up on anti-spam research. 4 of the 6 papers listed on their website were presented this year, at CEAS.

  • Another good paper was On Attacking Statistical Spam Filters, by Gregory L. Wittel and S. Felix Wu, which (similarly to Henry Stern’s submission, which I helped a little with) dealt with an attack on Bayesian filters.

    This is interesting stuff; we’re pretty sure it’s not as serious as it could possibly be, in SpamAssassin’s implementation, but it’s still a serious attack.

  • The Impact of Feature Selection on Signature-Driven Spam Detection was an interesting paper on AOL’s new signature schemes. (The conference was sponsored by Cloudmark, BTW, but those guys were nowhere to be seen — in which case they missed this presentation ;)
  • Reputation Network Analysis for Email Filtering was interesting, in that it mirrors to a degree the thinking behind web-o-trust.org, but in my opinion suffered due to a lack of thought about avoiding spoofing (by including IP address information in the FOAF file, it could do this now). However, once SPF becomes pervasive, this could be combined with that to generate personalised webs of trust usable for email whitelisting.
  • Resisting SPAM Delivery by TCP Damping was very nifty; plug a classifier into your MTA, and thereby detect connections from spam relays. Once you’ve found them, you then throttle down their connection as they attempt to deliver spam. Some other TCP-level tricks can do nifty stuff like massively increasing the bandwidth consumption of the spamming machines. Very very nice!

I took copious notes on the SpamAssassin wiki, if anyone’s curious.

Tags: , , , , , , , , ,

Comments

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »