The volume of spam continues to rise inexorably. Brightmail are now estimating that 54% of all mail messages are spam.
Nowadays, my personal mail account is getting about 70 a day, rising to
over 200 a day at the weekends. It’s getting tiresome; pretty much
all of it gets marked as spam and diverted, but I still have to wade
through it ‘just in case’, and to build the corpus. I guess I need
to extend my
.procmailrc to divert high-scoring spams somewhere
I can check even less frequently ;)
That’s not the really annoying thing, though. I use tagged addressing when I publish my email address, most of the time. It works very well to identify spam sources overall, and divert ‘dead’ addresses that are getting spam, into the spamtraps. That’s the plus.
But the curse of writing spam filters is that you need a good archive of spam; and one of our SpamAssassin corpus guidelines is to attempt to trim out duplicate spams where possible. Many spammers will wind up sending more-or-less identical spam messages, modulo random subject lines, hash-busters, etc., and with (let’s say) 8 tagged addresses in their lists, I’ll get 8 copies of that spam, and have to pay a little bit of attention to trim it down to 1 copy for the corpus.
Damn spam-filter development! All this corpus building is hard work ;)
BTW, note how spam load rises at the weekends; (Tim Hunter, Paul Terry and Alan Judge of eircom.net also noted this in their paper presented at LISA ’03 yesterday ;). There’s a good reason — spammers attempt to deliver their spam while abuse staff are not at their desk. Same thing applies in the network security world; many of those attacks have taken place over a US holiday weekend.
Hallowe’en: best too-late idea for a hallowe’en costume: ‘Top Gun GWB’ in his flight suit. In the end, I played half of the ‘Dr. Frankenstein and Monster’ pair (I was the monster, as C really is a scientist, and computer ‘science’ doesn’t count). Best costume seen: a very impressive onnagata kabuki player.