Spam: So, CEAS was great fun, and very educational:
- Got to meet up with various antispammers, including Daniel and Theo from the SpamAssassin dev team, Jeff Chan from SURBL, Dan Kohn from Habeas, Catherine Hampton from The SpamBouncer, Miles Libbey, John Levine, Neil Schwartzman — lots of good chats.
- MS really know how to feed a conference! I hear rumours there was an extra-special tinned-meat-product-based dish at the banquet…
- But their firewalling tendencies put a serious damper on keeping in touch with the outside world, at least until we set up an SSH tunnel on port 443 ;)
- During a lull, Dan Kohn fired off a hands-up census — a good 75% of the attendees (roughly) admitted to using SpamAssassin!
My highlight papers:
IBM’s Chung-Kwei pattern-discovery system — the one which Mark dug up. Very interesting stuff; it turns out that bioinformatics is full of large corpora of data (genomes) which you then need to find patterns in. Funnily enough, so is SpamAssassin: s/genomes/spam/, s/patterns/regular expressions/. The more advanced pattern-discovery algorithms even allow complex patterns to contain alternative blocks, ‘don’t-cares’ and similar regular-expression-like features.
The really good bit of Chung-Kwei is the Teiresias algorithm (more pages, online demo). Of course, being IBM research, it’s probably patented to the hilt, and may be tricky to license; but it’s certainly pointed us in a whole new interesting direction — anyone know any bioinformaticians?
Another good paper was On Attacking Statistical Spam Filters, by Gregory L. Wittel and S. Felix Wu, which (similarly to Henry Stern’s submission, which I helped a little with) dealt with an attack on Bayesian filters.
This is interesting stuff; we’re pretty sure it’s not as serious as it could possibly be, in SpamAssassin’s implementation, but it’s still a serious attack.
- The Impact of Feature Selection on Signature-Driven Spam Detection was an interesting paper on AOL’s new signature schemes. (The conference was sponsored by Cloudmark, BTW, but those guys were nowhere to be seen — in which case they missed this presentation ;)
- Reputation Network Analysis for Email Filtering was interesting, in that it mirrors to a degree the thinking behind web-o-trust.org, but in my opinion suffered due to a lack of thought about avoiding spoofing (by including IP address information in the FOAF file, it could do this now). However, once SPF becomes pervasive, this could be combined with that to generate personalised webs of trust usable for email whitelisting.
- Resisting SPAM Delivery by TCP Damping was very nifty; plug a classifier into your MTA, and thereby detect connections from spam relays. Once you’ve found them, you then throttle down their connection as they attempt to deliver spam. Some other TCP-level tricks can do nifty stuff like massively increasing the bandwidth consumption of the spamming machines. Very very nice!
I took copious notes on the SpamAssassin wiki, if anyone’s curious.