April 17, 2007 - Justin's Linklog

Like many anti-spam systems these days, SpamAssassin operates a network of spamtraps. One set of these run off traps.SpamAssassin.org, a server kindly donated by ISP Sonic.net.

Large-scale spam-trapping systems like this are generally run in quite a secretive manner, but we’re an open source project — so it may be interesting if I give some details of our setup. Here’s a potted history of how this spamtrap server has run over the years…

The beginning

The architecture was initially very simple. The MX was Postfix, delivering to the “trapper” user, which in turn ran procmail, which directly ran a perl script. This perl script then performed the trap actions, namely: DoS prevention, discarding viruses and malware, discarding backscatter bounces, extraction and cleanup of the incoming mails, then onward reporting, archival, and further distribution.

Given that this was a target for spam — and we want as much spam as possible here! — this would predictably run into load issues. Right at the beginning, back in around 2001/2002, I ran this on our shared server, where it pretty quickly caused trouble for delivery of other, more useful mail. It was around this time that Sonic kindly donated the server.

With dedicated hardware, we weren’t seeing much trouble — it was enough to just wait for the few hours for a traffic spike to pass, and the Postfix queue would then clear.

Clearing the queues

After a few months, though, this wasn’t enough — the queue would get consistently clogged, and the backlog became enough to result in the incoming spam being delayed for days before it made it from the MX to the trap archives. For a spamtrap, you want fresh spam, but not necessarily all spam — so I installed a cron job to simply clear the queue on a nightly basis. (I also had to restart the Postfix server, too, since it’d occasionally get hung and stop accepting connections on port 25, presumably due to load issues.)

IPC::DirQueue

The next level was an inability of the procmail/perl script end to process the mail fast enough for the MTA to keep up with the incoming connections, and follow-on problems, caused by load generated by the perl script impacting the MX’s activity. To work around these, I designed a new queueing backend, based around IPC::DirQueue. This allowed a new split architecture; the procmail-run perl script was extremely lightweight, delivering all inbound mail to a dirqueue and exiting quickly, allowing the MX to get back to the next inbound spam message, and the trap processing script was then split into a web of dirqueues, allowing each individual part of the trap backend pipeline to operate independently.

There were several benefits to this:

Since dirqueues operate as a batch-processing model, load spikes become irrelevant; the load incurred is limited by how many dequeuer processes are run.
The time taken in backend tasks becomes irrelevant to the MX throughput, since that is bottlenecked only by the lightweight perl script and its write speed to the “incoming” dirqueue.
By splitting the backend work into multiple queues, outages in the spam-reporting systems or onward forwardings become much less of a problem, since they won’t affect inbound spam, archival, outbound delivery to other reporting systems, forwards, etc.

Again, the dirqueues were cleared on a frequent basis, to discard the “spiky” traffic and ensure we were just seeing samples of the freshest spam. The dirqueues use a tmpfs as the backing storage directory, so it never hits the disk at all.

This worked pretty well for several years — from 80 megabytes of spam per day to the current level, which is around 130MB per day. However, we still occasionally saw problems from load spikes, where high load caused the traps to refuse incoming SMTP connections — purely because the load of inbound connections is too high for the Postfix MX to accept them all in a timely fashion.

qpsmtpd

Last weekend, I had a go at a project I’d been thinking of trying out for a long time — switching from Postfix to qpsmtpd. A while back, Matt Sergeant rewrote qpsmtpd to use Danga::Socket, Danga Interactive / Six Apart’s insanely scalable event-driven asynchronous socket class, as used in mogilefsd, perlbal and djabberd. This article notes that ‘two large antispam companies’ high-traffic spam traps have used this effectively since the second quarter of 2005, delivering concurrency as high as 10,000 on some occasions’, so it seemed likely to work ;)

Sure enough, results have been great… we now have a pure-perl system handling heavy volumes without breaking a sweat, certainly compared to the previous system. qpsmtpd’s plugin system was elegant, allowing me to annotate inbound spam with more details of the SMTP transaction, write plugins to deliver mail to a dirqueue directly instead of to an MTA, and do some conditional code (ie. basic “deliver this RCPT TO to this queue”) where needed.

Full details are over on the QpsmtpdSpamtrap page on the taint.org wiki, for the curious.

Archives

Using qpsmtpd for traps.spamassassin.org