Planet Antispam update

A brief update on Planet Antispam

I’ve just added MailChannels’ Anti-Spam Blog. Now — in the interests of disclosure — I’m a member of MailChannels’ Technical Advisory Board. However, that didn’t affect this — their blog has had consistently good, interesting posts dealing with anti-spam-related topics, and without too much plugging of their own products. ;)

Also added recently:

If you know of any other good email anti-spam-related blogs, drop a line in the comments here. (Note that I’m trying to keep it email-related, however, so we’re not covering web-spam.)

Tags: , , , ,

Comments (4)

Planet Antispam Update

Hey, some Planet Antispam updates. I’ve upgraded to Planet 2.0, and that seems to have solved some of the wierdness with consuming Atom feeds.

Also, there are two new antispam weblogs added to the subscription list:

Welcome guys!

(btw, if you’re wondering what happened to the music post — I moved it over here, to the mp3 blog where it was supposed to be posted in the first place, duh ;)

Tags: , , , ,

Comments (2)

Blogorrah

Blurred Keys: Blogorrah.com - the start of empire building with ‘very few overheads’. Blurred Keys, “an Irish media blog”, brings the revelation that Blogorrah “copies” Gawker.com.

Honestly, though, this is blatantly obvious — and I’d consider it unfair to call this “copying”. It’s simply taking a successful format and adapting it to the local market, and doing so very well indeed if you ask me.

Blogorrah is a hilarious read. If you’re Irish and you’re not subscribed, you’re really missing out… it’s the funniest thing on the Irish web these days.

Tags: , , , ,

Comments (3)

Blog Spam, and a ‘nofollow’ Post-Mortem

An interesting article on blog-spam countermeasures — Google’s embarrassing mistake. Quote:

I think it’s time we all agreed that the ‘nofollow’ tag has been a complete failure.

For those of you new to the concept, nofollow is a tag that blogs can add to hyperlinks in blog comments. The tag tells Google not to use that link in calculating the PageRank for the linked site. [...]

Since its enthusiastic adoption a year and a half ago, by Google, Six Apart, Wordpress, and of course the eminent Dave Winer, I think we can all agree that nofollow has done — nothing. Comment spam? Thicker than ever. It’s had absolutely no effect on the volume of spam. That’s probably because comment spammers don’t give a crap, because the marginal cost of spamming is so low. Also, nofollow-tagged links are still links, which means that humans can still click on them — and if humans can click, there’s a chance somebody might visit the linked sites after all.

I agree. At the time, I pointed at this comment from Mark Pilgrim:

Spammers have it in their heads now that weblog comments are a vector to exploit. They don’t look at individual results and tweak their software to stop bothering individuals. They write generic software that works with millions of sites and goes after them en masse. So you would end up with just as much spam, it would just be displayed with unlinked URLs.

Spammers don’t read blogs; they just write to them.

I still think he was spot on.

However, one part of the ‘Google’s embarrassing mistake’ article is a red herring — I think the chilling effect on “nonspam links” is not to be worried about; as Jeremy Zawodny said, life’s too short to worry about dropping links purely in the hopes of giving yourself Page Rank. I don’t know if I really want links that people are leaving purely for that reason. ;)

In fact, I wouldn’t be surprised to hear that Google’s crawler starts treating “nofollow” links as mildly non-spammy in a future revision, due to their wide use in wikis, blogs etc.

To be honest, though — I don’t see the problem of blog-spam much anymore. As I said here:

[Weblog] comment spam should be a lot easier to deal with than SMTP spam. … With weblog comments, you control the protocol entirely, whereas with SMTP you’re stuck with an existing protocol and very little “wiggle room”.

On my WordPress weblog [ie. here] — which, admittedly, gets only about 1/4 of the traffic plasticbag.org does — I’ve instituted a very simple check stolen from Jeremy Zawodny. I simply include a form field which asks the comment poster for my first name, and if they fail to supply that, the comment is dropped. In addition, I’ve removed the form fields to post directly, requiring that all comments are previewed; this has the nice bonus of increasing comment quality, too.

Those are the only antispam measures I’m using there, and as a result of those two I get about 1 successful spam posted per week, which is a one-click moderation task in my email. That’s it.

The key is to not use the same measures as everyone else — if every weblog has a different set of protocols, with different form fields asking different simple questions, the only spammers that can beat that are the ones that write custom code for your site — or use human operators sitting down to an IE window.

Trackbacks, however — turn that off. The protocol was designed poorly, with insufficient thought given to its abuse potential; there’s no point keeping it around, now that it’s a spam vector.

Finally, a “perfect” solution to blog spam, while allowing comments, is unachievable. There will always be one guy who’s going to sit down at a real web browser to hand-type a comment extolling the virtues of some product or another. The goal is to get it to a level where you get one of those per week, and it’s a one-click operation to discard them.

(Update: This story got Slashdotted! The poor server’s been up and down repeatedly — looks like it needs an upgrade. In the meantime, WP-Cache has proven its weight in gold; recommended…)

Tags: , , , ,

Comments (30)

Poll: keep ‘Fixing Email Weblog’ in Planet Antispam?

I added the Fixing Email weblog to Planet Antispam a while back — however, I’m not entirely sure at this stage that its content (which is seems to be primarily news syndication) fits with the “planet” concept (which is primarily intended for first-person posts).

So — quick poll. Let me know what you think, pro or con, Planet readers: should I remove the Fixing Email feed from that site?

Update: that was a pretty resounding ‘yes’. Done!

Tags: , , ,

Comments (5)

Link-blog Networking

Cool — del.icio.us just added a feature whereby you can now see who has you in their network, and, of course, you can further view their networks and see who’s in them.

This’d be great to produce social-network graphs, although I daresay Joshua mightn’t be so keen on the spidering load. ;) I’ve optimistically requested some form of dump, anyway.

The social networking aspect of link collection and link-blogging via del.icio.us is emerging nicely; I’m keen to see what’s next in the pipeline.

A few interesting things:

  • Almost everyone who’s using del.icio.us seriously for link collection — ie. applying some quality control thresholds, and bothering to write one-line descriptions, at least — has filled out their ‘network’ by now.

  • It’d be useful to have “groups”, so that we can now assert things like “jm, boogah, n0wak, negatendo, tweebiscuit, leonardr, muckster and torrez form a group”. I’m sure that’d provide useful info, although could probably be inferred anyway. (People are attempting to hack it by using a shared tag on all their postings, like the “irishblogs” tag, but that’s an awful misuse of tagging in my opinion ;)

  • Also, it’ll be interesting to see what’ll happen once Google Co-op figures out a way to incorporate the del.icio.us network data. To be honest, I’m very surprised it wasn’t already in there — it seems like a no-brainer… maybe some Y!/G corporate rivalry is getting in the way.

Anyway, in the meantime it’s producing lots of good fodder for my SpicyLinks feed.

SpicyLinks is an implementation of something that I mentioned in a comment on this weblog entry, regarding future methods of reading weblogs; in essence, it’s an automated blog aggregation summariser. It reads other people’s link-blogs, so I don’t have to, and reports the stuff that proves popular in my personal collection of sources.
(Credit where due: HotLinks provided much of the inspiration, but doesn’t support personalisation, hence the reimplementation.)

SpicyLinks is similar to Populicious, but that app really misses the point, in my opinion. I don’t particularly want to know what everyone is pointing at; I want to know what a selected set of trusted sources (with good taste!) are pointing at.

This aggregation is pretty similar to the del.icio.us ‘network’ feed, but with much lower volume, and a higher signal/noise ratio, attained by dropping the ‘one-off’ items that only one person is pointing at. Initially, that may seem like a major failure, since you miss the ‘fresh bits’ — but as long as you’ve got the right people in your source network, it actually works very well.

It’d be great if this was one of the features implemented in the del.icio.us ‘network’ system…

Tags: , , , , , , , , , , ,

Comments (4)

Planet Antispam update

Quick update — I’ve added Ed Falk’s “Spam Diaries” to http://planet.spam.abuse.net/ .

Tags: , , ,

Comments

Weblog Spam and Adversarial Classification

Dr. Dave, author of the Spam Karma WordPress antispam plugin, has posted an interesting article about new weblog-spammer tactics:

These spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn’t look carefully at the contact name and URL.

(Thanks to Kelson Vibber for the pointer to this.)

In other words, he is noting what we noticed in email anti-spam; that what works well one year, is likely to degrade over time as the spammers attempt to evade it, and one has to keep working to keep up.

The best term for this appears to be adversarial classification. Anti-spam activities fall into this category, and it often means that classic text classification algorithms aren’t suitable — after all, the Reuters-21578 dataset never tried to evade your classifier ;)

In a similar vein, this MS research paper is interesting:

Previous work on adversarial classification has made the unrealistic assumption that the attacker has perfect knowledge of the classifier. …. We present efficient algorithms for reverse engineering linear classifiers with either continuous or Boolean features and demonstrate their effectiveness using real data from the domain of spam filtering.

It’s akin to John Graham-Cumming’s work looking into how a spammer could get past a bayesian filter “from the outside”, but with more techniques, and examining MS’ MaxEnt algorithm, too. PDF here, well worth a read.

(By the way, I’m in the process of moving house, so if you send me an email, it may take a while for me to reply. This situation is likely to prevail for the next few weeks, for what it’s worth — fun.)

Tags: , , , , , ,

Comments (2)

Planet Antispam: Beta No More

Planet Antispam has been working pretty nicely for the last couple of weeks — can’t say I’ve noticed any trouble, and its RSS feed is turning out to be a nice aggregation of anti-spam news. On top of that, John Levine was kind enough to set up a CNAME for it at a more appropriate URL — http://planet.spam.abuse.net/.

As a result, it’s now fully-fledged, and fit to lose the ‘beta’ qualifier. Please bookmark, subscribe to the feeds, and pass on the URL to others you think may be interested!

Tags: , , , , ,

Comments

Planet Antispam

So a few weeks back, I mooted the idea of an anti-spam Planet site, similar to Planet GNOME, Planet Java, Planet Perl et al.

Here’s the results: Planet Antispam.

It’s still got a few rough edges; notably, the URL is not permanent — I’d prefer something at a more spam-themed domain — and the logo is the generic “PlanetPlanet” one. But it’s up and running in a beta-ish fashion.

Feel free to bookmark, subscribe, post the URL on, etc.; and if you’d like to give it a better home with an A record at a spam-themed domain, drop me a line.

Update, Jan 17: Thanks to John Levine, it now has a permanent home at http://planet.spam.abuse.net/ . After several weeks of operation, I think it’s turning out to be pretty solid, too!

By the way, it also needs more source feeds. If you know of people with blogs, working on/writing about anti-spam (of the email variety), with RSS feeds that work, include the post text, and permit further redistribution of that text, drop us a line and I’ll add them.

Finally, here’s a picture of a Starbucks SPAM(r) Sandwich. (shudder)

Tags: , , , , ,

Comments (49)

Del.icio.us ranking systems

Weblogs: there’s been a few attempts to mine ‘trend’ data from del.icio.us:

However, none consider how many links a user generates. A user who links to every single page on the web would quickly gain a good ‘trendsetting’ rating, and would also skew the website trends upwards, without actually providing useful data to others.

A look at the hublog top posters does seem to indicate they’re linking prolifically to any old crap that looks likely to be popular, which is a more humanly-possible way to do that. ;)

However, populicious new links is quite cool — popular sites that are new in the last 24 hours. Especially handy to find out where one could download Daily Show torrents these days. ;)

There’s also the venerable Hot Links, which unfortunately tracks a very small population, but still gets interesting stuff.

Tags: , , , , , , , , , ,

Comments

Daev Walsh is blogging the deep sea!

Weblogs: Greenpeace: Mysteries of the Deep — ‘the SV Rainbow Warrior left Auckland, New Zealand, on a voyage around the surrounding waters. Our mission: To highlight the irreversible damage caused to deep sea life by bottom trawling.’ Official weblog maintainer for the voyage: one Daev Walsh. Nice one Daev!

Tags: , , , , , , , , , ,

Comments

Political Compass blog-mapping

Weblogs: Great — now I can figure out who my political neighbours are in blog-space. No wonder I like reading Crooked Timber – they’re fellow ( -8–6.01 , -8–6.01 )-ers! (Catchy.)

Tags: , , , , , , ,

Comments