Blog Spam, and a ‘nofollow’ Post-Mortem

An interesting article on blog-spam countermeasures — Google’s embarrassing mistake. Quote:

I think it’s time we all agreed that the ‘nofollow’ tag has been a complete failure.

For those of you new to the concept, nofollow is a tag that blogs can add to hyperlinks in blog comments. The tag tells Google not to use that link in calculating the PageRank for the linked site. [...]

Since its enthusiastic adoption a year and a half ago, by Google, Six Apart, WordPress, and of course the eminent Dave Winer, I think we can all agree that nofollow has done — nothing. Comment spam? Thicker than ever. It’s had absolutely no effect on the volume of spam. That’s probably because comment spammers don’t give a crap, because the marginal cost of spamming is so low. Also, nofollow-tagged links are still links, which means that humans can still click on them — and if humans can click, there’s a chance somebody might visit the linked sites after all.

I agree. At the time, I pointed at this comment from Mark Pilgrim:

Spammers have it in their heads now that weblog comments are a vector to exploit. They don’t look at individual results and tweak their software to stop bothering individuals. They write generic software that works with millions of sites and goes after them en masse. So you would end up with just as much spam, it would just be displayed with unlinked URLs.

Spammers don’t read blogs; they just write to them.

I still think he was spot on.

However, one part of the ‘Google’s embarrassing mistake’ article is a red herring — I think the chilling effect on “nonspam links” is not to be worried about; as Jeremy Zawodny said, life’s too short to worry about dropping links purely in the hopes of giving yourself Page Rank. I don’t know if I really want links that people are leaving purely for that reason. ;)

In fact, I wouldn’t be surprised to hear that Google’s crawler starts treating “nofollow” links as mildly non-spammy in a future revision, due to their wide use in wikis, blogs etc.

To be honest, though — I don’t see the problem of blog-spam much anymore. As I said here:

[Weblog] comment spam should be a lot easier to deal with than SMTP spam. … With weblog comments, you control the protocol entirely, whereas with SMTP you’re stuck with an existing protocol and very little “wiggle room”.

On my WordPress weblog [ie. here] — which, admittedly, gets only about 1/4 of the traffic plasticbag.org does — I’ve instituted a very simple check stolen from Jeremy Zawodny. I simply include a form field which asks the comment poster for my first name, and if they fail to supply that, the comment is dropped. In addition, I’ve removed the form fields to post directly, requiring that all comments are previewed; this has the nice bonus of increasing comment quality, too.

Those are the only antispam measures I’m using there, and as a result of those two I get about 1 successful spam posted per week, which is a one-click moderation task in my email. That’s it.

The key is to not use the same measures as everyone else — if every weblog has a different set of protocols, with different form fields asking different simple questions, the only spammers that can beat that are the ones that write custom code for your site — or use human operators sitting down to an IE window.

Trackbacks, however — turn that off. The protocol was designed poorly, with insufficient thought given to its abuse potential; there’s no point keeping it around, now that it’s a spam vector.

Finally, a “perfect” solution to blog spam, while allowing comments, is unachievable. There will always be one guy who’s going to sit down at a real web browser to hand-type a comment extolling the virtues of some product or another. The goal is to get it to a level where you get one of those per week, and it’s a one-click operation to discard them.

(Update: This story got Slashdotted! The poor server’s been up and down repeatedly — looks like it needs an upgrade. In the meantime, WP-Cache has proven its weight in gold; recommended…)

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

30 Comments

  1. Posted May 31, 2006 at 16:54 | Permalink

    I implmented nofollow for Slashdot. And I did it not primarily to reduce comment spam — which our moderation system and other tools handle pretty well already, as Slashdot gets very littler comment spam — but to reduce the effects of comment spam on search engines. If you post with a comment bonus (which you can get with high karma), you get no nofollow attribute, because we figure, chances are, your links will be useful to the search engines.

  2. Posted May 31, 2006 at 18:45 | Permalink

    one advantage to a homebrew weblog system is that my comment handling is sufficiently different from wordpress, moveabletype, and the rest that i don’t get any of the autoposted spam. all i get is the occasional obviously hand-entered spam, about one a week. and it’s been such a non-problem that i haven’t even bothered to create the one-click tool, i just go in and delete those from the database manually when they happen.)

    i did get trackback spam, but i ripped out my trackback support and that solved that problem.

  3. artifex
    Posted May 31, 2006 at 20:47 | Permalink

    I’ve never seen the point of trackback. It’s basically just saying, “Hey, everyone, look, someone wrote about what I said! I rule!” Especially when they’re all segregated above the real comments. When they are sprinkled amongst them, instead, they just make the comments that much harder to skim.

    p.s. pudge, you rule. Although I think the URLs in the bodies of articles shouldn’t have nofollow, because if we’re /.ing them, at least they can get some benefit from it, right? And they’re unlikely to be spam, if the editor has done his job.

  4. Posted May 31, 2006 at 20:52 | Permalink

    In response to Jim up above, I also have a homebrew blog, but someone at some point took the time to write an automated poster to it, much to my surprise. I noticed that the IP address that it was posted from was completely unrelated (often on the other side of the world) from the form site (it made 3 connections, one to get the form, one to post, and one to check to see if it posted). My guess is that this was via Tor.. The simple solution (which unfortunately also affected AOLers) was to require the form retrieval and the POST to happen from the same IP. Their automated tool still occasionally tries to post a couple hundred in a few seconds, but it doesn’t work anymore. It’s odd that someone would bother though.

  5. Posted May 31, 2006 at 21:51 | Permalink
    Trackbacks, however — turn that off. The protocol was designed poorly, with insufficient thought given to its abuse potential; there’s no point keeping it around, now that it’s a spam vector.

    For what it’s worth, you can pretty much stop spam by looking at the TrackBack URI to make sure that page links to yours. The TrackBack Validator plug-in for WordPress performs this check and yields excellent results (since, as you state, spammers don’t care about “nofollow” and other countermeasures: it’s easier to just spread the spam net wider).

  6. Posted May 31, 2006 at 22:39 | Permalink

    Right, artifex, we don’t do nofollow for stories. That’d be silly. :-)

  7. Posted May 31, 2006 at 22:40 | Permalink

    just checking to see if I could post this blogspam : )

    seriously, after 27 years, they kicked me to the curb and sent my job to india.

    please visit my site and click the google ads, I’m trying to replace the income I’ve lost to Bangalorese swamis.

    thanks

  8. Posted May 31, 2006 at 23:29 | Permalink

    Wow. Simple, logical, thoughtful.

    Hey! I thought this was a blog!

    You know the rules. Say something idiotic, right now. (um — i’m agreeing)

    sabadash

  9. Posted May 31, 2006 at 23:47 | Permalink

    I have a message so unimpeachably important, so world-changingly disruptive, so awesomely anastrophic, that the rules of scientific full disclosure compel me to spread the Mentifex AI Memes via weblog Comments and all other what-have-you avenues of communication. Even so, eventually I expect to be asked, why did you not try even harder to tell us what was about to happen?

  10. Posted June 1, 2006 at 00:10 | Permalink

    Welcome back from your slashdotting!

    As Dan pointed out, checking the trackback site for links takes care of it pretty easily. I suppose eventually they might start putting temporary links back to the target pages, like those link exchange spams that get sent to webmaster accounts, and then both verified trackback and pingback will have the same problem.

    Manual spam does happen occasionally, though. I use Spam Karma on my blog, and it uses a captcha for borderline cases. I actually get spam where someone has filled out the captcha!

    I do think nofollow has its occasional uses, like linking to someone you don’t like and don’t want to increase their fame. There’s also the issue of sites that aren’t well-maintained. That came up in a nofollow discussion I had last year. For every site run by someone like you or me who obsessively checks for comments and removes the spam, there’s a leftover Geocities page from 1997 with a guestbook full of dubious links. It can also work as a less-severe moderation mechanism: Post the comment immediately with nofollow, and if you look at it and accept it, drop the attribute. That way if <spider-of-your-choice> drops in between the time the spammer posts the comment and the time you remove it, the spider doesn’t take the link as a vote for the site.

    Of course, none of those applications solves the presence of spam… just its impact on search rankings.

  11. Posted June 1, 2006 at 00:18 | Permalink

    Gee, I hope my page rank goes up!

    hahaha.

  12. Posted June 1, 2006 at 00:40 | Permalink

    Justin, thanks for the thoughtful comments on my post. Regarding the “chilling effect” — you and Jeremy are right, this is a pretty minor effect. But even small bits of friction can slow things down. Every time I see something on your blog that I want to respond to, I have a choice: Post a comment on your blog? Or put a post on my own blog that may or may not link back to your blog? A tiny shift in the perceived value of those two choices can, over time, make a big difference in the way people choose to communicate in this medium.

  13. Posted June 1, 2006 at 01:36 | Permalink

    I don’t much see the point of turning off trackbacks, a semi-stupid captcha like system (Justin indeed), or anything along those lines. Existing comment spam blockers work nicely. Bad Behavior stops nearly all automated spam, and Akismet catches the rest. Where’s the fire?

  14. Randy
    Posted June 1, 2006 at 06:05 | Permalink

    I profile IPs. I block you if you are not western. I’m sorry, i’m very sorry.

    deny brazil deny russia deny amsterdam deny india

    Hostas noches, senior!

  15. Posted June 1, 2006 at 06:06 | Permalink

    I’ve had tremendous success with Akismet. From one of my blogs:

    “Akismet has caught 5,211 spam for you since you first installed it.”

    False positives? One that I’m aware of. False negatives? A handful – 10 at most.

    p.s. Previewing unchecks the ‘Notify me’ box here…

  16. Posted June 1, 2006 at 08:33 | Permalink

    Justin it’s somewhat ironic that you still have nofollow enabled for the comments here, then ;-)

    Even more so, since you neglected to quote the bits from Dylan and Jeremy where they discussed how nofollow has changed the economics of linking:

    Worse, nofollow has another, more pernicious effect, which is that it reduces the value of legitimate comments. Here’s how: Why should I bother entering a comment on your blog, after all? Well, I might comment because you’re my friend. But I might also want some tiny little reward for participating in a discussion, contributing to the content on your site, and generally enhancing the value of the conversational Web. That reward? PageRank, baby. But if your blog uses the nofollow tag, you’ve just eliminated that tiny little bit of reciprocity. Thanks, but no thanks. I’d rather just comment on my own blog. And maybe, if you’re lucky, I’ll link back to you.

  17. Posted June 1, 2006 at 12:49 | Permalink

    Wow, slashdotted! WP-Cache has been saving the day — my server was melting down before I got that installed ;)

    pudge — that /. implementation is a nice trick; a pretty logical way to deal with it.

    ceejayoz — thanks for the note, I hadn’t spotted that. If I get some tuits I’ll sort that out…

    Arto — re ‘it’s somewhat ironic that you still have nofollow enabled for the comments here, then’:

    The point I was making, though, is that “nofollow” is invisible to the people I (and spammers!) care about — human readers. The only readers that pay attention to “nofollow”, are search engines who choose to respect it. So it really doesn’t matter to me if the links in my weblog comments are nofollow, since it won’t affect other people reading it. I’m not explicitly anti nofollow — I just think it’s nearly pointless as a way to reduce comment spam volumes.

    In fact, I might as well leave it on, just in case one in a thousand spambots does notice it, and avoids wasting my CPU time and bandwidth. (Unlikely, but you never know.)

    Really, “nofollow” is just a way to help out Google; not a useful tactic to block or slow down comment spammers. In fact, as this story notes, one comment-spammer explicitly pointed out that he was spamming weblog comments in an attempt to garner clickthroughs, not Page Rank. If that’s the case, then the whole nofollow/Page Rank focus on comment spam is entirely misguided, since the spammers aren’t interested in Page Rank anyway, and just want links, nofollow or not, that humans might follow!

    By the way, there’s a couple of comments in the /. thread stating that “nofollow was never intended to solve comment spam”. Yeah right — in that case, someone should have told Google, given that the title of their announcement back in January 2005 was Preventing comment spam. oops. ;)

  18. Posted June 1, 2006 at 16:22 | Permalink

    Regarding the quote above from nofollow and the economics of linking, I find that attitude disgusting. People who see every human activity, even commenting on blogs, as a way for them to get a reward, and who arn’t comfortable doing anything without a reward should really reconsider the way they’re living life.

  19. Posted June 1, 2006 at 20:54 | Permalink

    This “enter my first name” trick is a great idea.

    I think I’ll have to implement something similar in my blog — since I’ve just been using the standard CAPCHA and moderating everything. Well, I really got tired of that and turned off comments, which takes out a lot of the fun….

  20. Posted June 4, 2006 at 17:31 | Permalink

    testing your anti-spam test, I need one too.

    do you think passing the answer from the preview form as a hidden without forcing it to be reentered lets spammers just insert from that point? ie, could they post to your site automatically by skipping the inital form and submitting as if straight from a preview form?

    A way to make this harder for spammers but still easy for you is to have a small array of questions and answers to use. Then randomly use them. Pass the index value. That way there isn’t just one answer they have to use as an argument.

  21. Posted June 6, 2006 at 18:35 | Permalink

    “tester”: the idea is not to make a system that’s hard for a spammer to reverse-engineer; it’s just to make a system that a spammer would have to reverse for every site they wish to post spam on. It’s an economy-of-effort thing…

  22. Posted June 6, 2006 at 19:00 | Permalink

    I also agree adding other unique fields helps to decrease automated spam. Spam generally goes for the easy route, based upon repeatable patterns used by 1000′s of blogs.

    RE: Chilling Effect Its not so much the chilling effect that matters, which is there, to me its a reciprocity factor. Relevant comments on your blog add weight in the search engines by added keywords and unique text, as well as value to your blog by (hopefully) adding insightful thoughts to the discussion. If you deem the comment relevant and approve it, then it seems fair to allow something to flow back to the commenter in terms of search engine juice.

    For some people, nofollow does lead to this “hoarding” mechanism and there have been cases of people using it to try and “channel pagerank” within their own sites. Anytime a tool like this is loosely designed it is bound to be used in ways other than anticipated.

    Michael @ SEOG

  23. Posted February 26, 2007 at 02:18 | Permalink

    Well hopefully no one is going to accuse me of spamming a page with a page rank of zero, although with the following unique text and additional keywords, maybe the rank will go up… ;-)

    Forget nofollow for the search engine spiders! How about NOREAD or NOCLICK for the human beings! I suggest you NOREAD the following, as it is much too boring. (Now imagine you’re a search engine, and consider how it feels to be ordered around in that way.)

    The nofollow / hoarding issue points to a basic flaw in google’s whole approach to ranking pages on the basis of links. Whilst it did dramatically move search on from where it was, any algorithm, once understood, is a target for manipulation. Reliance of positive feedback tends to direct traffic down paths that are already established, and leave the new and unlinked pages sidelined. (1) We promote the pages that people are linking to. (2) People link to pages that they are able to find. (3) The pages they are able to find are the ones we are promoting… The same kind of positive feedback loop that left a generation using the inferior VHS in preference to the superior betamax. What I wonder about, is how we can set up the web to, using this analogy, allow people to find VHS, but deliver betamax.

  24. Posted February 26, 2007 at 11:41 | Permalink

    Bindon — this page has a PR of 4, according to http://www.mygooglepagerank.com . So not much use for spammers, then, but better than zero! Thanks for the comment.

  25. Posted June 7, 2007 at 16:24 | Permalink

    Hey, Justin. Maybe a result everyone posting such “interesting” comments, your nofollow pages’ rank has gone up to 5! … It is the Karma of linking. Please yes follow as much as you like.

  26. Posted September 9, 2007 at 01:28 | Permalink

    I must admit that at first i thought the nofollow tag would help, but people dojust seem to ignore it! – You got a ( missing on your post a comment form by the way Justin! – email required bit…

  27. Posted November 10, 2007 at 08:52 | Permalink

    one advantage to a homebrew weblog system is that my comment handling is sufficiently different from wordpress, moveabletype, and the rest that i don’t get any of the autoposted spam. all i get is the occasional obviously hand-entered spam, about one a week. and it’s been such a non-problem that i haven’t even bothered to create the one-click tool, i just go in and delete those from the database manually when they happen.)

    i did get trackback spam, but i ripped out my trackback support and that solved that problem.

  28. Posted April 10, 2008 at 10:19 | Permalink

    The official claim is that links with the rel=nofollow attribute do not influence the search engine rankings of the target page. In addition to Google, Yahoo and MSN also support the rel=nofollow attribute.

    i think it helps indexing

  29. Posted October 2, 2008 at 14:28 | Permalink

    It is criminal that we need to pay money for links to our websites. Most companies that need these links are start ups and have low budgets. It is really cool when a website allows the links to follow! There are many stories we can tell. Cheers guys!

  30. Posted May 19, 2009 at 20:21 | Permalink

    And the debate still rages on. My issue is that google says that a site well linked will rank higher, but how to get those links. My own site is an example. The only real way to get relevant high quality links would be to link to my competitors. These guys use link volume to rank and some have blogs and websites crosslinked that I thought was also frowned upon. So whats to do?