Hack: reassassinate

A coworker today, returning from a couple of weeks holiday, bemoaned the quantities of spam he had to wade through. I mentioned a hack I often used in this situation, which was to discard the spam and download the 2 weeks of supposed-nonspam as a huge mbox, and rescan it all with spamassassin — since the intervening 2 weeks gave us plenty of time for the URLs to be blacklisted by URIBLs and IPs to be listed by DNSBLs, this generally results in better spamfilter accuracy, at least in terms of reducing false negatives (the “missed spam”). In other words, it gets rid of most of the remaining spam nicely.

Chatting about this, it occurred to us that it’d be easy enough to generalize this hack into something more widely useful by hooking up the Mail::IMAPClient CPAN module with Mail::SpamAssassin, and in fact, it’d be pretty likely that someone else would already have done so.

Sure enough, a search threw up this node on perlmonks.org, containing a script which did pretty much all that. Here’s a minor freshening: download

reassassinate – run SpamAssassin on an IMAP mailbox, then reupload

Usage: ./reassassinate –user jmason –host mail.example.com –inbox INBOX –junkfolder INBOX.crap

Runs SpamAssassin over all mail messages in an IMAP mailbox, skipping ones it’s processed before. It then reuploads the rewritten messages to two locations depending on whether they are spam or not; nonspam messages are simply re-saved to the original mailbox, spam messages are sent to the mailbox specified in “–junkfolder”.

This is especially handy if some time passed since the mails were originally delivered, allowing more of the message contents of spam mails to be blacklisted by third-party DNSBLs and URIBLs in the meantime.

Prerequisites:

  • Mail::IMAPClient
  • Mail::SpamAssassin

Tags: , , , , ,

Comments (3)

Closed phish data costing $326mm per year

Richard Clayton posted a very interesting article over at Light Blue Touchpaper; he notes:

Tyler Moore and I are presenting another one of our academic phishing papers today at the Anti-Phishing Working Group’s Third eCrime Researchers Summit here in Atlanta, Georgia. The paper “The consequence of non-cooperation in the fight against phishing” (pre-proceedings version here) goes some way to explaining anomalies we found in our previous analysis of phishing website lifetimes. The “take-down” companies reckon to get phishing websites removed within a few hours, whereas our measurements show that the average lifetimes are a few days.

When we examined our data [...] we found that we were receiving “feeds” of phishing website URLs from several different sources — and the “take-down” companies that were passing the data to us were not passing the data to each other.

So it often occurs that take-down company A knows about a phishing website targeting a particular bank, but take-down company B is ignorant of its existence. If it is company B that has the contract for removing sites for that bank then, since they don’t know the website exists, they take no action and the site stays up.

Since we were receiving data feeds from both company A and company B, we knew the site existed and we measured its lifetime — which is much extended. In fact, it’s somewhat of a mystery why it is removed at all! Our best guess is that reports made directly to ISPs trigger removal.

They go on to estimate that ‘an extra $326 million per annum is currently being put at risk by the lack of data sharing.’

This is a classic example of how the proprietary mindset fails where it comes to dealing with abuse and criminal activity online. It would be obviously more useful for the public at large if the data were shared between organisations, and published publicly, but if you view your data feed as a key ingredient of your company’s proprietary “secret sauce” IP, you are not likely to publish and share it :(

The anti-phishing world appears to be full of this kind of stuff, disappointingly — probably because of the money-making opportunities available when providing services to big banks — but anti-spam isn’t free of it either.

Mark another one up for open source and open data…

(thanks to ryanr for the pic)

Tags: , , , , , ,

Comments

Links for 2008-10-08

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-10-04

Tags: , , , , ,

Comments

Links for 2008-10-02

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-09-16

Tags: , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-09-15

Tags: , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-09-13

Tags: , , , ,

Comments

Links for 2008-09-12

Tags: , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

GoDaddy’s spam filter is broken

GoDaddy is rejecting mail with URLs that appear in the Spamhaus PBL. As this thread on the Amazon EC2 forum notes, this is creating false positives, causing nonspam mail to be rejected. Here’s what GoDaddy reportedly said about this policy:

Unfortunately, our system is set to reject mails sent from or including links listed in the SBL, PBL or XBL. Because the IP address associated to [REMOVED] is listed in the PBL, any emails containing a link to this site will be rejected. This includes plain-text emails including this information.

If this is true, it’s utterly broken.

Spamhaus explicitly warn that this is not to be done, on the PBL page:

Do not use PBL in filters that do any ‘deep parsing’ of Received headers, or for other than checking IP addresses that hand off to your mailservers.

And more explicitly in the Spamhaus PBL FAQ:

PBL should not be used for URI-based blocking! Consider the false positive potential: legitimate webservers hosted with services such as dyndns.com or ath.cx! Or consider that ISPs and other networks are encouraged to list any IP ranges which should not send mail, and that could include web servers! Use SBL or XBL (or sbl-xbl.spamhaus.org) for URI blocking as described in our Effective Spam Filtering section. Use PBL only for SMTP (mail).

Critically, the PBL now lists all Amazon EC2 space, since Spamhaus interpret Amazon’s policy as forbidding email to be delivered via direct SMTP from there. (Note — email, not HTTP.)

With this filter in place at GoDaddy, that now means that if you mail a URL of any page on any site hosted at EC2 to a user of GoDaddy, your mail won’t get through.

Note: this is much worse than blocks of SMTP traffic from EC2. In that case, an EC2 user can relay their legit SMTP traffic via an off-EC2 host. In this case, there is no similar option in HTTP that isn’t insufferably kludgy. :(

Tags: , , , , , , ,

Comments (5)

How tightly linked are the top spam botnets?

I was away on holidays last week, and when I got back, I found my feed reader full of some good discussion as to whether today’s bigger spam botnets — Srizbi, Rustock, Mega-D, Cutwail/Pushdo — are sharing components, such as “landing” sites, exploits, customers, and even command and control networks. It started with this post on the FireEye Malware Intelligence Lab’s blog noting:

‘Some malware researchers have described Srizbi and Rustock as rival botnets, our data indicates that this apparent rivalry is a sibling rivalry at best. Srizbi and Rustock seem to be supported (controlled) by the same parent (bot herder).’

and in this followup:

‘We can clearly see that Srizbi, Pushdo and Rustock are using same ISP, and in many cases, IPs on the same subnet to host their Command and Control servers. It seems extremely unlikely to our research team that three previously “rival” Botnets would share nearly consecutive IP space, and be hosted in the same physical facility. Of all the data centers and IPs in the world, the fact that they are all on the same subnet is very intriguing. This fact makes the FireEye research team conclude that either the Botnets are operated by the same organization, or that the datacenter (McColo) is a shell corporation that leases out it’s IP space and bandwidth for nefarious actions.’ [...]

‘IPs at a typical datacenter are leased out in a /30 or more commonly, a /29 block. However, here we can see that in a given succession of IPs, the three Botnets have C&C servers dispersed throughout. This gives us an impression that same Bot herder leased out a larger range and then distributed it amongst its different Botnets.’

Marshal say: ‘at the very least, the major botnets have common customers.’

Dark Reading cover it like so:

Rustock, which recently edged Srizbi for the top slot as the biggest spammer mostly due to a wave of fake Olympics and CNN news spam, and Srizbi, known for fake video and DVD spam, have been using the same Trojan, Trojan.Exchanger, to download their bot malware updates, researchers say. “This is the first time” we had seen this connection between the two botnets, says Fengmin Gong, chief security content officer for anti-botnet software firm FireEye. “That’s why when we saw it, it was surprising. They definitely have a relationship,” he says. “There’s not the rivalry we used to think about.” [...]

Joe Stewart, director of security research for SecureWorks, says the Srizbi-Rustock connection is most likely due to a spammer using both zombie networks — not that the operators of the two botnets are actually collaborating. “What is confusing people is that you’re seeing Rustock bots sending out emails that essentially infect people with Srizbi, so they think it must be Srizbi that’s sending it, but it’s not,” he says. “Srizbi is not just one big model. It’s rented out to lots of different spammers.”

A major spammer may be trying to diversify by using the two botnets, he says. “It could be because they want to separate their malware-seeding operation from their spamming operation,” Stewart says. “Maybe their bots are getting blacklisted faster when they’re sending out URLs with fake video files because they’re easy to spot, so their spam doesn’t get through. So they send malware from this botnet, and spam from this one, to keep out of the blacklists longer.”

I agree that Joe’s scenario is very likely; the spammers aren’t always the same people who operate the botnets, and it only makes sense that some of them would spread their business among multiple nets, to minimize the risk that all of their output would be blocked if one ‘net runs into trouble (or indeed, good filtering ;). But seeing C&C servers sharing LANs also strikes me as unusual. One to watch.

Anyway, it’s good to see that the malware research blogs are now actively tracking and posting updates when the botnets change topics and format; this info is very valuable for us in anti-spam, as it allows us to map from the received spam mails back to the sending botnet, and determine which rules are good at detecting each botnet. Thanks, guys.

(image credit: cobalt123, used under CC license)

Tags: , , ,

Comments

Links for 2008-08-14

Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-08-12

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-08-10

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-08-01

TechCrunch UK campaigning for a “Digital Hub” I have to say, the Digital Hub is actually a great place to work; it’s well worth duplicating, if such a thing is possible

419eater anti-scammers fool 419ers into performing the Dead Parrot sketch “Possibly, he is pining for the fee-ords”

Google taking action against Nigerian/419 fraud spammers Good news. About time, too ;)

Tags: , , , , , , , , , , , , , , , ,

Comments

Links for 2008-07-31

Del.icio.us 2.0 goes live yay! I’ve been waiting for this for yonks

10 years of Boards.ie massive ~50GB RDF/XML dump, for open crunching, to generate interesting “SIOC Semantic Web” apps

Postmaster.comcast.net how to get mail delivered successfully to Comcast, the usual stuff

Why we’ll never replace SMTP ‘The reason that e-mail is uniquely useful is that you can exchange mail with people you don’t already know. The reason that spam exists is that you can exchange mail with people you don’t already know.’ +1

“Bikes-for-Billboards” scheme exposes major planning flaws ‘what was initially hailed as “free bikes” has become one of the biggest planning controversies to hit Dublin in years.’ No shit. 70% of sites are on the Northside, rather than the richer Southside; and each bike will cost over EUR300k in ad revenue!

Rob Enderle’s page on Wikipedia detailing this analyst’s hilariously wrong pro-SCO, anti-Apple/Linux predictions over the years. John Gruber: ‘the only way it would be worthwhile for reporters to [quote him] would be if they were willing to describe him as “almost always utterly wrong”‘

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Amazon EC2’s spam and malware problems

Over the past few weeks, I’ve increasingly heard of spam and abuse problems originating in Amazon EC2.

This has culminated in a blog post yesterday by Brian Krebs at the Washington Post:

It took me by surprise this weekend to discover that that mounds of porn spam and junk e-mail laced with computer viruses are actively being blasted from digital real estate leased to [Amazon].

He goes on to discuss how EC2 space is now actively blocked by Outblaze, and has been listed by Spamhaus in their PBL list. A spokesperson for Amazon said:

“We have a clear acceptable use policy and whenever we have received a complaint of spam or malware coming through Amazon EC2, we have moved swiftly to strictly enforce the use policy by network isolating (or even terminating) any offending instances,” Kinton said. She added that Amazon has since taken action against the EC2 systems hosting the [malware].

However as Seth Breidbart noted in the comments, ‘note that Amazon will terminate the instance. That means that the spammer just creates another instance, which gets a new IP address, and continues spamming.’ True enough – as described, instance termination simply isn’t good enough.

My recommendations:

  • as John Levine noted, it’s likely that Amazon need to treat EC2-originated traffic similarly to how an ISP treats their DSL pools – filtering outbound traffic for nastiness, in particular rate-limiting port 25/tcp connections on a per-customer basis, so that an instance run by (or infiltrated by) a spammer cannot produce massive quantities of spam before it is detected and cut off.

    However, I’m not talking about blocking port 25/tcp outbound entirely. That’s not appropriate — an EC2 instance is analogous to a leased colo box in a server farm, and not being able to send mail from our instances would really suck for EC2 users (like myself and my employers).

  • It would help if there were a way to look up customer IDs from the IP address of the EC2 nodes they’re using — either via WHOIS or through rDNS. Even an opaque customer ID string would allow anti-abuse teams to correlate a single customer’s activity as they cycle through EC2 instances. This would allow those teams to deal with the reputation of Amazon’s customers, instead of Amazon’s own rep, analogous to how “traditional” hosters use SWIP to publicize their reassignments of IPs between their customers.

There’s some more discussion buried in a load of knee-jerking on the NANOG thread. Here’s a few good snippets:

Jon Lewis: ‘I got the impression the only thing Amazon considers abuse is use of their servers and not paying the bill. If you’re a paying customer, you can do whatever you like.’ (ouch.)

Ken Simpson: ‘IMHO, Amazon will eventually be forced to bifurcate their EC2 IP space into a section that is for “newbies” and a section for established customers. The newbie space will be widely black-listed, but will also have a lower rate of abuse complaint enforcement. The only scalable way to deal with a system like EC2 is to provide clear demarcations of where the crap is likely to originate from.’

Bill Herrin: ‘From an address-reputation perspective EC2 is no different than, say, China. Connections from China start life much closer to my filtering threshold that connections from Europe because a far lower percentage of the connections from China are legitimate. EC2 will get the same treatment.’

There’s also an earlier thread here.

Anyway, this issue is on fire — Amazon need to get the finger out and deal with it quickly and effectively, before EC2 does start to run into widespread blocks. I’m already planning migration of our mail-sending components off of EC2; we’re already seeing blocks of mail sent from it, and it’s looking likely that these will increase. :(

(It’s worth noting that a block of EC2’s netblocks today will produce a load of false positives, mainly on transactional mail, if you’re contemplating it. So I wouldn’t recommend it. But a lot of sites are willing to accept a few FPs, it seems.)

Tags: , , , , , , ,

Comments (19)

TypePad AntiSpam

TypePad AntiSpam looks pretty cool. I’ve been trying it out for the past week on taint.org and underseacommunity.com, with no false positives or false negatives so far (although mind you I don’t get much spam, anyway, on those blogs, fortunately). Both are WordPress blogs — I set up Akismet, got a TypePad API key, and edited 3 lines in “wp-content/plugins/akismet/akismet.php”, and I was off.

However, here’s the key bit, the bit I’m most excited about — /svn/antispam/trunk/, particularly the GPL v2 LICENSE file — a fully open source backend!

The backend is a perl app built on Gearman and memcached. It uses DSpam instead of SpamAssassin, but hey, you can’t have everything ;) Nice, clean-looking perl code, too. Here’s hoping I get some tuits RSN to get this installed locally…

Tags: , , , ,

Comments (4)

More details on the “GMail forwarding hole”

Those INSERT guys who’ve been talking about a GMail security hole allowing spammers to relay spam, have released more previous-redacted details here. (thanks to the MailChannels blog for pointing that out.)

In essence, the attack works by allowing a spammer to set the “forward to” address in GMail to point at a target address, send a spam to the GMail account, then change the “forward to” address to the next target and repeat.

My response:

  1. it’d be trivial for Google to impose stringent rate limits on “forward to” address changes, and I’d be surprised if they haven’t already.

  2. ditto rate-limiting on the rate of forwarding messages for each GMail account.

  3. as they say in the paper — if Google required up-front confirmation of the target address before forwarding any mail, that would also cut this out neatly.

  4. It’s worth noting that GMail’s outbound servers may be whitelisted by some recipient sites, others are treating them negatively — word on the anti-spam “street” is that GMail is becoming a festering pit of 419 scammers these days.

Tags: , , , ,

Comments (10)

MailChannels’ Traffic Control now free-as-in-beer

I’m on the technical advisory board for MailChannels, a company who make a commercial traffic-shaping antispam product, Traffic Control. Basically, you put it in front of your real MTA, and it applies “the easy stuff” — greet-pause, early-talker disconnection, lookup against front-line DNSBLs, etc. — in a massively scalable, event-driven fashion, handling thousands of SMTP connections in a single process. By taking care of 80% of the bad stuff upfront, it takes a massive load off of your backend — and, key point, off your SpamAssassin setup. ;)

Until recently, the product was for-pay and (relatively) hard to get your hands on, but as of today, they’re making it available as a download at http://mailchannels.com/download/. Apparently: “it’s free for low-volume use, but high volume users will need a license key.”

Anyway, take a look, if you’re interested. I think it’s pretty cool. (And I’m not just saying that because I’m on their tech advisory board. ;)

Tags: , , , , , , , , ,

Comments (2)

Backscatter rising

Recently, more and more people have been complaining about backscatter; its levels seem to have increased over the past few weeks.

If you’re unfamiliar with the terminology — backscatter is mail you didn’t ask to receive, generated by legitimate, non-spam-sending systems in response to spam. Here are some examples, courtesy of Al Iverson:

  • Misdirected bounces from spam runs, from mail servers who “accept then bounce” instead of rejecting mail during the SMTP transaction.
  • Misdirected virus/worm “OMG your mail was infected!” email notifications from virus scanners.
  • Misdirected “please confirm your subscription” requests from mailing lists that allow email-based signup requests.
  • Out of office or vacation autoreplies and autoresponders.
  • Challenge requests from “Challenge/Response” anti-spam software. Maybe C/R software works great for you, but it generates significant backscatter to people you don’t know.

It used to be OK to send some of these types of mail — but no longer. Nowadays, due to the rise in backscatter caused by spammer/malware abuse, it is no longer considered good practice to “accept then bounce” mail from an SMTP session, or in any other way respond by mail to an unauthorized address of the mail’s senders.

Backscatter as spam delivery mechanism

I would hazard a guess that this rise is due to one of the major spam-sending botnets adopting the use of “real” sender addresses rather than randomly-generated fake ones, probably in order to evade broken-by-design Sender-Address Verification filters.

There’s an alternate theory that spammers use backscatter as a means of spam delivery — intending for the mails to bounce, in effect using the bounce as the spam delivery mechanism. Symantec’s most recent “State of Spam” report in particular highlights this.

I don’t buy it, however. Compare their own example message — here’s what the mail originally sent by the spammer to the bouncer, rendered:

img

And here’s what it looks like once it passes through the bouncer’s mail system:

img2

That’s simply unreadable. There’s absolutely no way for a targeted end user to read the “payload” there…

Getting rid of it

I haven’t run into this recent spike in backscatter at all, myself, since I have a working setup that deals with it. This blog post describes it. If you’re using Postfix and SpamAssassin, it would be well worth taking a look; if you’re just using SpamAssassin and not Postfix, you should still try using the Virus Bounce Ruleset to rid yourself of various forms of unwanted bounce message.

Note that you need to set the ‘whitelist_bounce_relays’ setting to use the ruleset, otherwise its rules will not fire.

SPF

There’s a theory that setting SPF records (or other sender-auth mechanisms like DomainKeys or DKIM) on your domains, will reduce the amount of backscatter sent to your domains. Again, I doubt it.

Backscatter is being sent by old, legacy mail systems. These systems aren’t configured to take SPF into account either. When they’re eventually updated, it’s likely they’ll be fixed to simply not send “accept then bounce” responses after the SMTP transaction has completed. It’s unlikely that a system will be fixed to take SPF into account, but not fixed to stop sending backscatter noise.

It’s good advice to use these records anyway, but don’t do it because you want to stop backscatter.

What about my own bounces?

You might be worried that the SpamAssassin VBounce ruleset will block bounces sent in response to your own mail. As long as the error conditions are flagged during the SMTP transaction (as they should be nowadays), and you’ve specified your own mailserver(s) in ‘whitelist_bounce_relays’, you’re fine.

Tags: , , , , ,

Comments (7)

CEAS needs your ham

CEAS 2008 is doing another Spam Challenge test of various spam-filters, and as part of this, they need samples of ham mail messages.

As part of the data collection effort, we have set up a website through which it is possible to donate non-sensitive legitimate email, to be used in the evaluation. Any kind of email that the recipient considers legitimate is welcome, including computer generated (non-spam) messages.

After the CEAS evaluation, the benchmark data will be made publicly available to facilitate future reasearch and development in the field of spam prevention.

Here is the collection site; they accept UNIX mbox format, and tar.gz or zip files of same, with an 8MB upload limit.

Tags: , , , ,

Comments

Planet Antispam update

A brief update on Planet Antispam

I’ve just added MailChannels’ Anti-Spam Blog. Now — in the interests of disclosure — I’m a member of MailChannels’ Technical Advisory Board. However, that didn’t affect this — their blog has had consistently good, interesting posts dealing with anti-spam-related topics, and without too much plugging of their own products. ;)

Also added recently:

If you know of any other good email anti-spam-related blogs, drop a line in the comments here. (Note that I’m trying to keep it email-related, however, so we’re not covering web-spam.)

Tags: , , , ,

Comments (4)

Spammers “giving up” according to Google

According to this Wired story, Google reckons spammers are giving up on spam:

a remarkable trend is underfoot, according to Brad Taylor, a staff software engineer at Google: The number of spam attempts — that is, the number of junk messages sent out by spammers — is flat, and may even be declining for the first time in years.

Actually, this is a wilful misunderstanding of what the Googler in question really said, which was that ‘attempts to spam Gmail users have been leveling off over the last year and more recently, even declining slightly’. In other words, they didn’t make an observation about the state of the spam problem on an internet-wide basis — just about the “local” situation as it pertains to Gmail. Bad reporting there, Wired.

But, in passing…

David Berlind at ZDNet recently blogged a rather grumpy response to InfoWorld coverage of CEAS 2007. He raised a very important point:

If I could say something to the author of that story, it would be that so long as any anti-spam solution is not deployed universally throughout the Internet’s e-mail system (in other words, so long as some anti-spam tech is not a standard), that anti-spam solution actually makes the spam problem worse. You read that right. Worse. Proprietary anti-spam solutions make the global spam problem worse. They are digging us deeper into the hole that the Internet is already in because everyone who makes those solutions is under the false belief that “s/he who is finally successful at filtering out all spam while allowing the legitimate mail in wins.”

Google’s blog post is a case in point: ‘we’re keeping more spam out of your inbox than ever before, so more and more, you can use Gmail for things you enjoy without even realizing that the spam filter is there most of the time.’

That’s great — but it doesn’t help anyone except Gmail. It’s a myopic view of the spam problem, and David’s point stands.

(I disagree with his later conclusion that the only way forward is for Google, MS, AOL and Yahoo! to get together and ‘commit to jointly supporting the same technical solutions’ — when the usual BigCos get together, they tend to focus on their own priorities. Take what happened back in 2005 with nofollow for blog-spam — while it helped the search giants with their own overriding priority, which was to tweak their algorithms to filter out the spam on the search results page, it did nothing to slow the spam flood itself, which has continued unabated.)

We need more open-source, and open-data, anti-spam work.

Tags: , , , , , , , , , ,

Comments (9)

‘Blended threat’ = Storm

[Commtouch have apparently released an 'Email Threats Trend Report' for the third quarter of 2007], which contains this factoid:

Blended threat messages — or spam messages with links to malicious URLs — accounted for up to 8% of all global email traffic during the peaks of various attacks during the quarter [...]

Spam with malware hyperlinks inside: One technique which reached a new high during the quarter was innocent-appearing spam messages that contained hyperlinks to malware-sites. This type of spam utilizes vast zombie botnets to launch ‘drive-by downloads’ and evade detection by most anti-virus engines. Several blended spam attacks of this type focused on leisure-time activities, such as sports and video games. Messages invited consumers to download “fun” software such as NFL game-tracking and video games from what appeared to be legitimate websites. Instead, consumers voluntarily downloaded malware onto their computers.

Those short messages that invited downloads of NFL game-tracking software (”Get Your Free NFL Game Tracker”, “Football Fan Essentials”, “Are you ready for football season?” etc.), and video games (”Wow, free games!”, “New game software, with over 1000 games—FREE”, “Holy cow, 1000 free games online” etc.), is all output from the Storm worm — I wouldn’t call it a new kind of “blended threat” per se. I’m surprised that Commtouch didn’t name it; maybe they don’t realise it’s Storm?

I’d say it’s output is higher than 8% of my incoming spam, although it has reduced its spam output quite a bit recently.

Tags: , , , ,

Comments

Scary Storm figure

This study of the Storm worm (via) contains this rather terrifying factoid:

Figure 12 illustrates a time-volume graph of TCP packets, SMTP packets, spam messages, and smtp servers. Our analysis of this graph reveals the following findings. First, we find that except for the first 5 minutes almost all the TCP communication is dominated by spam. Second, we measured that hosts generate on average of 100 successful spam messages per five minutes, which translates to 1200 spam messages per hour or 28,800 messages per day. If we mutiply this by the estimated size for the Storm network (which we suspect varies between 1 million and 5 million, we derive that the total number of spam messages that could be generated by Storm is somewhere between 28 billion and 140 billon per day.

While such numbers might be mind-boggling they are inline with observed spam volumes in the Internet, e.g., overall volume of spam messages in the Internet per day in 2006 was estimated to be around 140 billion [2]; Spamhaus claims to have been blocking over 50 billion spam messages per day in October 2006 [10], and AOL was blocking 1.5 billion spam messages per day in its network in June 2006 [5]. These numbers suggest that Storm could be responsible for anywhere between 17% and 50% of all spam that is generated on the Internet.

28 to 140 billion messages per day. That is a lot of spam.

Minor nitpick with the paper — it notes that

Storm retrieves emails found in [certain] files and gathers information about possible hosts, users, and mailing lists that are referenced in these files. In particular, it looks for strings like “yahoo.com”, “gmail.com”, “rating@”, “f-secur”, “news”, “update”, “anyone@”, “bugs@”, “contract@”, “feste”, “gold-certs@”, “help@”, “info@”, “nobody@”, “noone@”, “kasp”, “admin”, “icrosoft”, “support”, “ntivi”, “unix”, “bsd”, “linux”, “listserv”, “certific”, “sopho”, “@foo”, “@iana”, “free-av”, “@messagelab”, “winzip”, “google”, “winrar”, “samples” , “abuse”, “panda”, “cafee”, “spam”, “pgp”, “@avp.” , “noreply” , “local”, “root@”, and “postmaster@”.

I would postulate that those strings are a stoplist — that in fact the worm avoids sending spam to addresses containing those strings. The presence of “abuse” and “postmaster” in particular would suggest that.

Tags: , , , , ,

Comments (4)

The Prime Time Group pump-and-dump

Spamnation.info links to an interesting article by Computerworld’s Gregg Keizer about the massive PRTH.PK spam run.

As usual, there are no shortage of suckers:

The spam blast did drive up Prime Time’s share price from Monday’s low of around 7 cents to Wednesday’s high of 11 cents, a 57% jump. Thursday morning, however, the bottom dropped out, and the stock fell to under 7 cents. Trading volumes peaked Wednesday as well, at around 1.7 million shares, substantially higher than any day in the month prior. “You can actually see the wave of activity in the stock and compare it with the volume of spam that we trapped,” said [Sophos analyst Ron] O’Brien.

But here’s an interesting new tactic by the good guys:

Last Wednesday afternoon, Prime Time announced that it was ordering a Non Objecting Beneficial Owners (NOBO) list to get a clearer picture of who owned its shares. “The NOBO list will be used to determine the naked short positions in Prime Time Group Inc.,” the company said in a statement. “The finding will then be reported to the [National Association of Securities Dealers] to take action against the violators of the naked short regulations.”

“Naked short” is a investment term that refers to selling short, essentially a bet that the price will drop, but with a twist: “naked” means that the investor sells short without first making sure he can borrow the shares from another investor holding a “long” position on the stock.

I hope this works; it’d be great to see the profit mechanism behind pump-and-dump spam killed off.

Spamnation notes:

Incidentally, the greeting card spam that built the botnet used to promote PRTH.PK and CYTV.OB also continues. It has iterated through another couple of generations: the current incarnation tells recipients to collect their custom Musical ecard or custom Movie-quality ecard or other variants on that theme. We’ve seen about 150 of these in the past three days, suggesting that the unknown senders are probably well on their way to building up another botnet for their next stock spam run.

Spreading trojans via greeting-card spam is a trademark of the gigantic Storm botnet, AFAIK: SecureWorks info, MessageLabs info, spam levels causing DDoS for Canadian networks, DDoS threat for EDU sector.

Tags: , , , , , , ,

Comments

Rule Discovery Progress Update

Back in March, I wrote a post about a new rule discovery algorithm I’d come up with, based on the BLAST bioinformatics algorithm. I’m still hacking on that; it’s gradually meandering towards production status, as time permits, so here’s an update on that progress.

There have been various tweaks to improve memory efficiency; I won’t go into those here, since they’re all in SVN history anyway. But the results are that the algorithm can now extract rules from 3500 spam and 50000 ham messages without consuming more than 36 MB of RAM, or hitting disk. It can also now generate a SpamAssassin rules file directly, and apply a basic set of QA parameters (required hit rate, required length of pattern, etc.).

On top of this, I’ve come up with a workflow to automatically generate a usable batch of rules, on a daily basis, from a spam and ham corpus. This works as follows:

  • Take a sample of the past 4 days traffic from our spamtrap network. Today this was about 3000 messages.

  • add the hand-vetted spam from my own accounts over the same period (this helps reduce bias, since spamtraps tend to collect a certain type of spam), about 3400 messages.

  • discard spams that scored over 10 points (to concentrate on the stuff we’re missing).

  • Pass the remaining 3517 spams, and text strings from over 50000 nonspam messages, into the “seek-phrases-in-log” script, specifying a minimum pattern length of 30 characters, and a minimum hitrate of 1% (in today’s corpus, a rule would have to hit at least 34 messages to qualify).

  • That script gronks for a couple of minutes, then produces an output rules file, in this case containing 28 rules, for human vetting. (Since I’ve started this workflow, I’ve only had to remove a couple of rules at this step, and not for false positives; instead, they were leaking spamtrap addresses.)

  • Once I’ve vetted it, I check it into rulesrc/sandbox/jm/20_sought.cf for testing by the SpamAssassin rule QA system.

The QA results for the ruleset from yesterday (Aug 3) can be seen here, and give a pretty good idea of how these rules have been performing over the past week or two; out of the nearly 70000 messages hit by the rules, only 2 ham mails are hit — 0.0009%.

In fact, I measured the ruleset’s overall performance in the logs provided by the 4 mass-check contributors who provided up-to-date data in yesterday’s nightly mass-check; bb-jm, jm, daf, dos, and theo (all SpamAssassin committers):

Contributor Hits Spams Percent
bb-jm 4249 24996 17.00%
jm 3450 14994 23.00%
daf 1236 35563 3.48%
dos 32867 100223 32.79%
theo 28077 382562 7.34%

(bb-jm and jm are both me; they scan different subsets of my mail.)

The “Percent” column measures the percentage of their spam collection that is hit by at least one of these rules; it works out to an average of 16.72% across all contributors. This is underestimating the true hitrate on “fresh” spam, too, since the mass-check corpora also include some really old spam collections (daf’s collection, for example, looks like it hasn’t been updated since the start of July).

Even better, a look at the score-map for these rules shows that they are, indeed, hitting the low-scoring spam that other rules don’t hit.

That’s pretty good going for an entirely-automated ruleset!

The next step is to come up with scores, and publish these for end-user use. I haven’t figured out how this’ll work yet; possibly we could even put them into the default “sa-update” channel, although the automated nature of these rules may mean this isn’t a goer.

If you’re interested, the hits-over-time graph for one of the rules (body JM_SEEK_ICZPZW / Home Networking For Dummies 3rd Edition \$10 /) can be viewed here.

Tags: , , , , ,

Comments (3)

A fishy Challenge-Response press release

I have a Google News notification set up for mentions of “SpamAssassin”, which is how I came across this press release on PRNewsWire:

Study: Challenge-Response Surpasses Other Anti-Spam Technologies in Performance, User Satisfaction and Reliability; Worst Performing are Filter-based ISP Solutions

NORTHBOROUGH, Mass., July 17 /PRNewswire/ — Brockmann & Company, a research and consulting firm, today released findings from its independent, self-funded “Spam Index Report– Comparing Real-World Performance of Anti-Spam Technologies.”

The study evaluated eight anti-spam technologies from the three main technology classes — filters, real-time black list services and challenge- response servers. The technologies were evaluated using the Spam Index, a new method in anti-spam performance measurement that leverages users’ real-world experiences.

[...] The report finds that the best performing anti-spam technology is challenge-response, based on that technology’s lowest average Spam Index score of 160.

[...] Filter – Open Source software-(Spam Index: 388): This technology is frequently configured to work in conjunction with PC email client filters. The server adds * * SPAM * * to the subject line so that the client filter can move the message into the junk folder. This class of software includes projects such as ASSP, Mail Washer and SpamAssassin, among others.

The “Spam Index” is a proprietary measurement of spam filtering, created by Brockmann and Company. A lower “Spam Index” score is better, apparently, so C/R wins! (Funny that. The author, Peter Brockmann, seems to have some kind of relationship with C/R vendor Sendio, being quoted in Sendio press releases like this one and this one, and providing a testimonial on the Sendio.com front page.)

However — there’s a fundamental flaw with that “Spam Index” measurement, though; it’s designed to make C/R look good. Here’s how it’s supposed to work. Take these four measurements:

  • Average number of spam messages each day x 20 (to get approximate number per work-month)
  • Average minutes spent dealing with spam each day x 20 (to get approximate minutes per work-month)
  • Number of resend requests last month
  • Number of trapped messages last month

Then sum them, and that gives you a “Spam Index”.

First off, let’s translate that into conventional spam filter accuracy terms. The ‘minutes spent dealing with spam each day’ measures false negatives, since having to ‘deal with’ (ie delete) spam means that the spam got past the filter into the user’s inbox. The ‘number of trapped messages’ means, presumably, both true positives — spam marked correctly as spam — and false positives – nonspam marked incorrectly as spam. The ‘number of resend requests last month’ also measures false positives, although it will vastly underestimate them.

Now, here’s the first problem. The “Spam Index” therefore considers a false negative as about as important as a false positive. However, in real terms, if a user’s legit mail is lost by a spam filter, that’s a much bigger failure than letting some more spam through. When measuring filters, you have to consider false positives as much more serious! (In fact, when we test SpamAssassin, we consider FPs to be 50 times more costly than a false negative.)

Here’s the second problem. Spam is sent using forged sender info, so if a spammer’s mail is challenged by a Challenge/Response filter, the challenge will be sent to one of:

  • (a) an address that doesn’t exist, and be discarded (this is fine); or
  • (b) to an invalid address on an innocent third-party system (wasting that system’s resources); or
  • (c) to an innocent third-party user on an innocent third-party system (wasting that system’s resources and, worst of all, the user’s time).

The “Spam Index” doesn’t measure the latter two failure cases in any way, so C/R isn’t penalised for that kind of abusive traffic it generates.

Also, if a good, nonspam mail is challenged, either

  • (a) the sender will receive the challenge and take the time to jump through the necessary hoops to get their mail delivered (”visit this web page, type in this CAPTCHA, click on this button” etc.); or
  • (b) they’ll receive the challenge, and not bother jumping through hoops (maybe they don’t consider the mail that important); or
  • (c) they’ll not be able to act on the challenge at all (for example, if an automated mail is challenged).

Again, the “Spam Index” doesn’t measure the latter two failure cases.

In other words, the situations where C/R fails are ignored. Is it any wonder C/R wins when the criteria are skewed to make that happen?

Tags: , , , , , ,

Comments (37)

Stop with the fake phish data

An anonymous friend in the anti-phishing community writes:

For those of you who blog and/or have contacts in the general computer user ‘go fight ‘em’ community:

Is there any way you can get the word out that dropping a couple hundred fake logins on a phishing site is NOT appreciated??

It creates havoc for those monitoring the drop since it’s an unbelieveable waste of time and resources to clean up the file. Also, for those drop files that ‘recycle’ after every 10 entries, valid data is lost.

It also creates havoc for those who get these files and try to notify victims. They waste time, too .. pulling legit info from amongst the trash.

I know there are programs out there that create/dump this stuff onto sites and some who call themselves ‘phish phighters’ enjoy the harassment aspect. But it wastes the time/effort of those who are seriously working these things.

Tags: , , ,

Comments (4)

Lyris’ low SpamAssassin threshold

via jgc’s newsletter, Lyris’ latest ISP Deliverability Report (Q1 2007) makes an interesting point about legitimate bulk mail and SpamAssassin:

Contrary to popular belief among marketers, message content is not a major cause of deliverability challenges for most email marketers. This finding is a result of testing the content of more than 1,705 unique emails, using [Lyris] EmailAdvisor’s content scoring tool. The content scoring function is based on the content scoring rules of the widely adopted Spam Assassin open source project. The emails tested had an average content point score of 1.04 well below the filter’s generally accepted spam identification level of 3.0 or higher.

Now, that’s broadly good advice — SpamAssassin hasn’t really given much strength to signatures found in message body text in the past couple of years, since the signatures from other sources (especially DNS blocklists and URI blocklists) are much more reliable.

However, note the bit I emphasised. Since when is 3.0 the ‘generally accepted spam identification level’? Only the most paranoid user would ever go that low, since at that level, they’d expect to find 2.22% of their nonspam mail going into the spam folder (according to our own tests). In reality, our recommended level has always been 5.0 points, and that’s what we optimise for. I’m mystified as to where they’re getting 3.0 from…

Tags: , , ,

Comments (3)

DSPAM acquired by Sensory Networks

whoa, didn’t see that coming. Quoting Jonathan Zdziarski via jgc’s newsletter:

…The [DSPAM] project had grown to a point where it would take others – with enough free time – to bring DSPAM to the next level as a widely accepted enterprise-class solution, and [I] decided that it would be in the best interest of the project to entrust it to someone with the technical knowhow and dedication to reach these goals. Many of you are aware of my work in the past with Sensory Networks in developing a hardware-accelerated version of DSPAM (capable of supporting multi-megabit speeds in large carrier environments). I’ve spent a considerable amount of time with SN’s team over the past several years and when we initially discussed working together, they had shown to be very excited and motivated about the project.

After careful consideration and many discussions at length, I decided to allow Sensory Networks to acquire the rights to the project, and continue development on it with their own team. SN has displayed a strong commitment to the open source community and has been working closely with other leading projects such as Snort, Clam Antivirus, and SpamAssassin. They assured me that the project will remain open-source and available to all, and at the same time the project will receive exposure in commercial environments it has not seen before, as many of you have been asking for. We’ve now completed the acquisition for the project, and I’d like to encourage you to support them in helping them move forward as it grows into new areas.

More details at zdziarski.com.

Tags: , , , ,

Comments

Dealing with backscatter, revisited

Back in January, I wrote about how I deal with email backscatter nowadays. Since then, I’ve made a notable tweak.

This is that I no longer reject “null-sender” traffic during the SMTP transaction. It turned out that it broke Exim’s implementation of Sender Address Verification, which performs the SAV check using a MAIL FROM of <>, rendering it indistinguishable from a bounce during the SMTP transaction.

Now, I’ve complained about SAV, but I have to be pragmatic anyway (Postel’s law and all that!) — so it was better to just allow other sites to perform SAV lookups against our server, and fix the anti-bounce stuff some other way.

The new method (below) does this, by allowing null-sender SMTP traffic just fine; it detects bounces in Postfix if they arrive via SMTP in RFC-3464 format, and bounces that slip past are then dealt with in a more CPU-intensive manner using the SpamAssassin “VBounce” ruleset (which is part of the now-released SpamAssassin 3.2.0, btw).

This increases the load, since some bounces cannot be rejected at MAIL FROM time now, and instead we have to wait ’til DATA — but CPU hasn’t been a problem recently, so this is ok.

Here are the updated instructions:

In Postfix

In my Postfix configuration, on the machine that acts as MX for my domains – edit ‘/etc/postfix/header_checks’, and add these lines:

/^Content-Type: multipart\/report; report-type=delivery-status\;/  REJECT no third-party DSNs
/^Content-Type: message\/delivery-status; /     REJECT no third-party DSNs

Edit ‘/etc/postfix/main.cf’, and ensure it contains:

header_checks = regexp:/etc/postfix/header_checks

Then run:

sudo /etc/init.d/postfix restart

This catches most of the bounces — RFC-3464-format Delivery-Status-Notification messages from other mail servers.

In SpamAssassin

As before, install the Virus-bounce ruleset and set it up. This will catch challenge-response mails, “out of office” noise, “virus scanner detected blah” crap, and bounce mails generated by really broken groupware MTAs — the stuff that gets past the Postfix front-line.

Tags: , , , , , , , , ,

Comments (7)

Moin Moin attachment spam

Here’s a new trick used by the web spammers — attachments on a Moin Moin wiki. The taint.org/wk RecentChanges list illustrates it well:

2007-05-07  set bookmark
[UPDATED]       UserPreferences         04:17   Info    ?StepStep [1-21]
#01 Upload of attachment 'big-cocks.html'. #02 Upload of attachment 'big-cock.html'. #03 Upload of attachment 'big-boobs.html'. #04 Upload of attachment 'big-ass.html'. #05 Upload of attachment 'bdsm.html'. #06 Upload of attachment 'bbw.html'. #07 Upload of attachment 'bang-bros.html'. #08 Upload of attachment 'bangbros.html'. #09 Upload of attachment 'baby.html'. #10 Upload of attachment 'asian-porn.html'. #11 Upload of attachment 'asian-girls.html'. #12 Upload of attachment 'anime-porn.html'. #13 Upload of attachment 'anime-girls.html'. #14 Upload of attachment 'angelina-jolie.html '. #15 Upload of attachment 'amature.html'. #16 Upload of attachment 'amatuer.html'. #17 Upload of attachment 'adult-videos.html'. #18 Upload of attachment 'adult-stories.html' . #19 Upload of attachment 'adult-games.html'. #20 Upload of attachment '69.html'. #21 Upload of attachment '3d.html'.

Great. Lots of spam. This first started appearing on Feb 27 2007, in a multi-upload attack on a single page (”FindPage”), from IP address 212.26.129.162; then reoccurred on Apr 27 and May 7 from the (insecure open proxy) proxy.drevlanka.ru.

Annoyingly my “subscribe to wiki changes” patch doesn’t catch this – these aren’t gatewayed through as “changes” via mail for review. I need to fix that in my copious free time. :(

Also, the RecentChanges RSS feed doesn’t list them, although the HTML form does.

So unfortunately, the only way I can see to block this is either to review by visiting the RecentChanges page in a web browser regularly (how retro!), and delete them retrospectively, or simply to turn off attachments entirely – which is what I’ve done, by editing “wikiconfig.py” and adding:

    actions_excluded = ['AttachFile']

It looks like quite a few other wikis around the web are running into the issue too :(

Tags: , , , , , ,

Comments (2)

SpamAssassin 3.2.0!

W00t! SpamAssassin 3.2.0 has finally gone gold!

This release is a big one — it’s the first major release since 3.1.0, back in September 2005, just over a year and a half ago. Here is the release announcement mail, containing a list of major changes since version 3.1.8. There are a few major new features that I feel worth picking out in more detail and editorialising about:

sa-compile

This is a biggie. This new script takes the active SpamAssassin ruleset, and uses code contributed by Matt Sergeant to produce input for re2c. re2c in turn compiles the ruleset into a deterministic finite automaton, which can match multiple regular expressions in parallel. That’s not all, though; re2c then compiles that DFA into C code — which is then compiled into native object code. SpamAssassin will then load that object code and use it to replace the slower perl regexp tests, if it’s available at scan-time.

Now, it’s been a long time since SpamAssassin’s ruleset consisted mainly of rudimentary regular expressions matched against the body text — a good portion of SpamAssassin’s ruleset these days operates against headers, performs network lookups, analyzes URLs extracted from the body, uses the more advanced features supported by Perl’s NFA regexp engine, or so on. But even given that, the effects of ’sa-compile’ seem to average between a 15% and 25% speedup, in my testing. That’s good ;)

Many of the commercial versions of SpamAssassin include their own body-rule speedups — but this is the first time anything similar has made it into the open source code.

Short-circuiting

Another good one for performance. There are some rules that you can reasonably assume will never hit nonspam or spam mail in a well-configured setup. For example, a hit on “ALL_TRUSTED” should mean that the message never traversed an untrusted network, therefore it cannot be spam, so why bother applying the expensive tests? It should be reasonable to “short-circuit” and immediately return a “ham” score for that mail.

This new plugin implements that algorithm — and efficiently, too, which historically has been the hard part!

I’ve been using this for a while with a ruleset like this one — in my experience, it’s cut overall CPU time spent scanning mail by 20%.

It is pretty flexible, too — there’s lot of tweakage that can be done with this functionality to suit your own setup.

Reduced memory footprint

One aim of this release has been to reduce the memory usage of SpamAssassin; the core code now uses less RAM than 3.1.x does, when tested with the same ruleset. (Unfortunately we’ve added lots more rules in the interim, so it’s a bit of a wash overall. ;)

The VBounce anti-bounce ruleset

Detects spurious bounce messages sent by broken mail systems in response to spam or viruses. More info about that here.

Apache-spamd

apache-spamd implements spamd as a mod_perl module. This was contributed by Radoslaw Zielinski, as a Google Summer of Code project last year. Thanks Radoslaw!

There are plenty more new, useful features and rules — these are just the top ones, in my opinion. Pretty cool stuff!

Tags: , , , , , , , ,

Comments (2)

Using qpsmtpd for traps.spamassassin.org

Like many anti-spam systems these days, SpamAssassin operates a network of spamtraps. One set of these run off traps.SpamAssassin.org, a server kindly donated by ISP Sonic.net.

Large-scale spam-trapping systems like this are generally run in quite a secretive manner, but we’re an open source project — so it may be interesting if I give some details of our setup. Here’s a potted history of how this spamtrap server has run over the years…

The beginning

The architecture was initially very simple. The MX was Postfix, delivering to the “trapper” user, which in turn ran procmail, which directly ran a perl script. This perl script then performed the trap actions, namely: DoS prevention, discarding viruses and malware, discarding backscatter bounces, extraction and cleanup of the incoming mails, then onward reporting, archival, and further distribution.

Given that this was a target for spam — and we want as much spam as possible here! — this would predictably run into load issues. Right at the beginning, back in around 2001/2002, I ran this on our shared server, where it pretty quickly caused trouble for delivery of other, more useful mail. It was around this time that Sonic kindly donated the server.

With dedicated hardware, we weren’t seeing much trouble — it was enough to just wait for the few hours for a traffic spike to pass, and the Postfix queue would then clear.

Clearing the queues

After a few months, though, this wasn’t enough — the queue would get consistently clogged, and the backlog became enough to result in the incoming spam being delayed for days before it made it from the MX to the trap archives. For a spamtrap, you want fresh spam, but not necessarily all spam — so I installed a cron job to simply clear the queue on a nightly basis. (I also had to restart the Postfix server, too, since it’d occasionally get hung and stop accepting connections on port 25, presumably due to load issues.)

IPC::DirQueue

The next level was an inability of the procmail/perl script end to process the mail fast enough for the MTA to keep up with the incoming connections, and follow-on problems, caused by load generated by the perl script impacting the MX’s activity. To work around these, I designed a new queueing backend, based around IPC::DirQueue. This allowed a new split architecture; the procmail-run perl script was extremely lightweight, delivering all inbound mail to a dirqueue and exiting quickly, allowing the MX to get back to the next inbound spam message, and the trap processing script was then split into a web of dirqueues, allowing each individual part of the trap backend pipeline to operate independently.

There were several benefits to this:

  1. Since dirqueues operate as a batch-processing model, load spikes become irrelevant; the load incurred is limited by how many dequeuer processes are run.
  2. The time taken in backend tasks becomes irrelevant to the MX throughput, since that is bottlenecked only by the lightweight perl script and its write speed to the “incoming” dirqueue.
  3. By splitting the backend work into multiple queues, outages in the spam-reporting systems or onward forwardings become much less of a problem, since they won’t affect inbound spam, archival, outbound delivery to other reporting systems, forwards, etc.

Again, the dirqueues were cleared on a frequent basis, to discard the “spiky” traffic and ensure we were just seeing samples of the freshest spam. The dirqueues use a tmpfs as the backing storage directory, so it never hits the disk at all.

This worked pretty well for several years — from 80 megabytes of spam per day to the current level, which is around 130MB per day. However, we still occasionally saw problems from load spikes, where high load caused the traps to refuse incoming SMTP connections — purely because the load of inbound connections is too high for the Postfix MX to accept them all in a timely fashion.

qpsmtpd

Last weekend, I had a go at a project I’d been thinking of trying out for a long time — switching from Postfix to qpsmtpd. A while back, Matt Sergeant rewrote qpsmtpd to use Danga::Socket, Danga Interactive / Six Apart’s insanely scalable event-driven asynchronous socket class, as used in mogilefsd, perlbal and djabberd. This article notes that ‘two large antispam companies’ high-traffic spam traps have used this effectively since the second quarter of 2005, delivering concurrency as high as 10,000 on some occasions’, so it seemed likely to work ;)

Sure enough, results have been great… we now have a pure-perl system handling heavy volumes without breaking a sweat, certainly compared to the previous system. qpsmtpd’s plugin system was elegant, allowing me to annotate inbound spam with more details of the SMTP transaction, write plugins to deliver mail to a dirqueue directly instead of to an MTA, and do some conditional code (ie. basic “deliver this RCPT TO to this queue”) where needed.

Full details are over on the QpsmtpdSpamtrap page on the taint.org wiki, for the curious.

Tags: , , , , , , , ,

Comments (5)

Don’t worry about Blacklist.ie

Irish techies — wondering what the next website to put the fear into your parents will be? Here it is: Blacklist.ie. It’s been getting a bit of coverage from the Irish technology press recently, it seems, as the new site from IE Internet.

(IE Internet are the Irish internet company that puts a press release every month or so telling us how much of their mail is being filtered as spam, which Silicon Republic et al dutifully report as news, month after month.)

I got a call from my mother last week, telling me that she’d been “blacklisted”, and asking how to fix it. Sure enough, when I found out that she’d heard this on blacklist.ie, I went to the site, and her IP address was indeed listed — as was mine:

The IP address 212.2.169.61 is blacklisted.

RBLs checked:

Spam Haus not listed

Spam Cop not listed

Mailwall RBL not listed

Abuse At not listed

SORBS not listed

NJABL listed: Dynamic/Residential IP range listed by NJABL dynablock – http://njabl.org/dynablock.html

510 SG not listed

Naturally, that IP is listed — it’s entirely ok for a home-user broadband machine to appear in SORBS or NJABL as a dynablock-listed IP. (Dynablock, for those who don’t know, is a set of records for addresses which are known to be residential/end-user “dynamic” addresses, rather than mail relays — so obviously most end-user desktop machines would fall under this category.)

Unfortunately, this distinction isn’t mentioned anywhere on the blacklist.ie page… just a large, red, “The IP address is blacklisted” warning.

Worried readers might then reasonably go on to read the site’s Frequently Asked Questions list — which, incredibly, includes a helpful suggestion that you sign up with IE Internet to avoid being listed in future! I’d be curious how that’s supposed to help a home user get off the NJABL dynablock list… a little fishy, if you ask me!

Tags: , , , , , ,

Comments (1)

Sender Address Verification considered harmful

(as an anti-spam technique, at least.)

Sender-address verification, also known as callback verification, is a technique to verify that mail is being sent with a valid envelope-sender return address. It is supported by Exim and Postfix, among others.

Some view this as a useful anti-spam technique. In my opinion, it’s not.

Spam/anti-spam is an adversarial “game”. Whenever you’re considering anti-spam techniques, it’s important to bear in mind game theory, and the possible countermeasures that spammers will respond with. Before SAV became prevalent, spam was often sent using entirely fake sender data; hence the initial attractiveness of SAV. Once SAV became worth evading, the spammers needed to find “real” sender addresses to evade it. And where’s the obvious place to find real addresses? On the list of target addresses they’re spamming!

Since the spam is now sent using forged sender addresses of “real” people, when a spam bounces (as much of it does), the bounce will be sent back not to an entirely fake address, but to a spam recipient’s address.

Hence, the spam recipients now get twice as much mail from each spam run – spam aimed at them, and bounce blowback from hundreds of spams aimed at others, forged to appear to be from them.

This is the obvious “next move” in response to SAV, which is one reason why we never implemented something like it in SpamAssassin.

On top of this — it doesn’t work well enough anymore. Verizon use SAV. Have you ever heard anyone talk about how great Verizon’s spam filtering is? Didn’t think so.

(This post is a little late, given that SAV has been used for years now, but better late than never ;)

By the way, it’s worth noting that it’s still marginally acceptable to use SAV as a general email acceptance policy for your site — ie. as a way to assert that you’re not going to accept mail from people who won’t accept mail to the envelope sender address used to deliver it. Just don’t be fooled into thinking it’s helping the spam problem, or is helping anyone else but yourself.

Finally, this Sender Address Verification is different from what Sendio calls Sender Address Verification. That’s just challenge-response, which is crap for an entirely different, and much worse, set of reasons.

Tags: , , , , , , , ,

Comments (20)

Spam volumes at accidental-DoS levels

Both Jeremy Zawodny and Dale Dougherty at O’Reilly Radar are expressing some pretty serious frustration with the current state of SMTP. I have to say, I’ve been feeling it too.

A couple of months back, our little server came under massive load; this had happened before, and normally in those situations it was a joe-job attack. Switching off all filtering and just collecting the targeted domain’s mail in a buffer for later processing would work to ameliorate the problem, by allowing the load to “drain”. Not this time, though.

Instead, when I turned off the filtering, the load was still too high — the massive volume of spam (and spam blowback / backscatter) was simply too much for the Postfix MTA. The MTA could not handle all the connections and SMTP traffic in time to simply collect all the data and store it in a file!

Looking into the “attack” afterwards, once the load was back under control, it looked likely that it wasn’t really an attack — it was just a volume spike. Massive SMTP load, caused by spammers increasing the volume of their output for no apparent reason. (Since then, spam volumes have been increasing still further on a nearly weekly basis.)

This is the effect of botnets — the amount of compromised hosts is now big enough to amplify spam attacks to server-swamping levels. Our server is not a big one, but it serves less than 50 users’ email I’d say; the user-to-CPU-power ratio is pretty good compared to most ISPs’ servers.

So here’s the thing. New SMTP-based methods of delivering nonspam email — whether based on DKIM, SPF, webs of trusted servers, or whatever — will not be able to operate if they have to compete for TCP connection slots with spammers, since spammers can now swamp the SMTP listener for port 25 with connections. In effect, spam will DDoS legitimate email, no matter what authentication system that legit mail uses to authenticate itself.

This, in my opinion, is a big problem.

What’s the fix? A “new SMTP” on a whole different port, where only authed email is permitted? How do you make that DoS-resistant? Ideas?

(Obviously, counting on spammers to notice or care is not a good approach.)

Tags: , , , , , , ,

Comments (24)

MailScanner developer in hospital

According to this message, Julian Field, the main developer of MailScanner, was found collapsed at his home last Friday. More details via the SA list:

He is in ICU though stable condition. I’ll not go into any details, anyone interested and not on the MS list can read the thread on the MS archive.

Currently any plans for cards and such as are on hold until further instructions are given to the MS list. However Matt Hampton has setup a clustermap at this address.

Matt will also forward any well wishes left on the website along with the map. Visiting the page will show Julian and his family just how far reaching his software is and how many people appreciate his efforts.

Get well, Julian! :(

Tags: , , ,

Comments

« Previous entries Next Page » Next Page »