Hack: reassassinate

A coworker today, returning from a couple of weeks holiday, bemoaned the quantities of spam he had to wade through. I mentioned a hack I often used in this situation, which was to discard the spam and download the 2 weeks of supposed-nonspam as a huge mbox, and rescan it all with spamassassin — since the intervening 2 weeks gave us plenty of time for the URLs to be blacklisted by URIBLs and IPs to be listed by DNSBLs, this generally results in better spamfilter accuracy, at least in terms of reducing false negatives (the “missed spam”). In other words, it gets rid of most of the remaining spam nicely.

Chatting about this, it occurred to us that it’d be easy enough to generalize this hack into something more widely useful by hooking up the Mail::IMAPClient CPAN module with Mail::SpamAssassin, and in fact, it’d be pretty likely that someone else would already have done so.

Sure enough, a search threw up this node on perlmonks.org, containing a script which did pretty much all that. Here’s a minor freshening: download

reassassinate – run SpamAssassin on an IMAP mailbox, then reupload

Usage: ./reassassinate –user jmason –host mail.example.com –inbox INBOX –junkfolder INBOX.crap

Runs SpamAssassin over all mail messages in an IMAP mailbox, skipping ones it’s processed before. It then reuploads the rewritten messages to two locations depending on whether they are spam or not; nonspam messages are simply re-saved to the original mailbox, spam messages are sent to the mailbox specified in “–junkfolder”.

This is especially handy if some time passed since the mails were originally delivered, allowing more of the message contents of spam mails to be blacklisted by third-party DNSBLs and URIBLs in the meantime.

Prerequisites:

  • Mail::IMAPClient
  • Mail::SpamAssassin
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

3 Comments

  1. Posted January 28, 2009 at 15:44 | Permalink

    Justin, this is a godsent! Thank you!

    Is this script being actively developed? Are you interested in bug reports?

    I ask since i found, that the script doesn’t preserve existing IMAP Flags. E.g. all rechecked messages where marked as UNREAD after the recheck, which is a bit unfortunate.

  2. Posted January 28, 2009 at 15:49 | Permalink

    hi Stefan —

    oops, that sounds pretty nasty :( I doubt I’ll get a chance to fix it, but if you can come up with a patch that works, I’ll happily apply it…

  3. Posted January 28, 2009 at 16:06 | Permalink

    I’ll see what i can come up with. Keeping read/unread status shouldn’t be hard. BUT i discovered something more nasty. Due to the reinjection of the mails, the Received Date of the messages is also changed to the current time (this is imap immanent). Not sure wether one can modify that behaviour. Sure, one can set her mail client to sort e.g. by Sent-Date instead of received date, but you know…