I’ve just found Gary Robinson’s blog, which is a bit silly, as boasts the primary source after Paul Graham’s‘A Plan For Spam’ paper for modern Bayesian spamfiltering techniques. I’d only read Gary’s page describing the Robinson-combining technique, but he’s been doing a good job of blogging the anti-spam world in general recently. Hence, he’s made the blogroll ;)
Some choice links from his blog:
The email thread that provoked this message will soon dissolve. Including [email protected] might have been useful, but the moment has passed. If I urgently need to contact [email protected] , I may have to grit my teeth and register to do so. But no ad-hoc communication is going to make it over that activation threshold.
And a different kind of whitelist — the IronPort Bonded Sender type, from Whitelists: the weapon of choice against spam (ZDNet):
After a one and half months of testing, IronPort identified hundreds of thousands of false-positives. At that rate, the mail generated by IronPort’s customers alone, which make up a small percentage of the total amount of e-mail that traverses the Internet, is resulting in over one million false-positives per year.
Hmm. Well, I’m not 100% convinced here — I did see Amazon.FR, who are apparently Bonded Sender customers, send a promotional mail to a mailing list. I also saw several reports from other places regarding the same mail. How often does a mailing list order goods from an e-commerce site? (But, having said that, that’s the only Bonded Sender issue I’ve seen in about 6 months — so let’s put that down to teething issues, or someone on the list who decided to act up when ordering some goods.)
Spamland.org, a new Wiki for spamfiltering.
Debra Bowen, a California State Senator, is proposing a hardcore new anti-spam bill. “It would bar unsolicited e-mail advertising and allow people who receive it to sue the senders for $500 per transmission. A judge could triple the penalty if he or she decided the violation was intentional. … ‘The ($500) fine’s really intended to get a whole generation of computer-savvy folks to help us do the enforcement,’ Bowen says. ‘Getting rid of spam is never going to be the district attorney’s first priority and it shouldn’t be.”‘ She notes also that she’s “seen estimates that it could grow to 50 percent in the next five years.” Too late — it’s already there, as far as I can tell.
FWIW, I like the sound of this — she’s requiring that commercial e-mail senders have an existing verified-opt-in relationship beforehand. Sounds good to me.
And finally, a very interesting set of tests on Robinson-combining strategies. Very interesting, that is, if you’re implementing a Bayesian spam filter. Otherwise quite boring. ;)