Google Webmaster Tools now includes ‘goog-love.pl’

Back in 2006, I wrote a script I called “goog-love.pl”; it used Google’s now-dead SOAP search API (thanks, Nelson!) to figure out which Google queries your web site was “winning” on. Unfortunately, Google shut down new signups for the SOAP interface later that year.

I was just looking through Google’s Webmaster Tools page for taint.org, when I came across the Statistics / Top search queries page:

img

This is exactly what goog-love.pl produced. hooray!

Tags: , , , , ,

Comments

What’s on this site, April 2008 edition

It’s been a while since I’ve listed the various sub-sites of taint.org in one post. I’ve just updated the taint.org wiki’s index page to include them, so might as well list them here, too:

Enjoy!

Tags: , ,

Comments

Taint.org Has Moved

I’m moving pretty much all my home sites and infrastructure from the venerable “dogma.boxhost.net” to a new host, “soman.fdntech.com”. This weblog has just made the jump. Please leave a comment if you notice anything awry.

There may be a few rough edges, since I upgraded to WordPress 2.2.2 in the process; for example, my sooper-s3kr1t “what is my name” anti-spam protocol was set to not require a preview of all posted comments, or the correct answer — in just over an hour I received 25 spam comments… so it’s good to know it’s working ;)

Tags: , , ,

Comments (9)

post-Digging stats

The “cool hack to solve a maze using Photoshop” post got, in turn, posted to reddit, Make.zine, then Digg, then Waxy’s links, fazed.net, then Boing Boing, then StumbleUpon. Pretty popular!

If you’re interested, here’s my referrer graphs. Basically, Digg wins, in terms of quantity at least if not quality, with a massive 40,000 visits — there’s a real long tail in hits there…

Tags: , , , , , ,

Comments (4)

Back

Hey — I’m back, rested and full of tasty, tasty Niçois and Provencal cuisine.

I got back just in time to vote, for what good that did with Bertie’s gang leading strongly in the current counts… argh!

For what it’s worth, I gave Patricia McKenna a preference, in the end. I was reminded that she’d been entirely on our side on software patents during her time as an MEP — so credit where it’s due, there; on top of that, a vote for the Greens is better than a vote going to Sinn Fein, after all, no matter what. ;)

Tags: , , ,

Comments (2)

About the title change

The eagle-eyed may have spotted a change that took place a month or two ago in the taint.org configuration — I ditched the old weblog tagline.

Previously, this weblog was titled “taint.org: Happy Software Prole”. This title had been in place since around October 2003, when Daniel Lyons wrote a particularly idiotic article for Forbes entitled “Linux’s Hit Men”, which I took umbrage to:

Here we go again — the old ‘free software is communism’ line [...] The article goes on to bemoan how software companies who write proprietary extensions into GPL-licensed software, have to comply with the terms of the license. It’s all a bit of an obvious dig — but I am looking forward to the follow-up article — that’s the one where the author bemoans how commercial software companies send out their ‘enforcers’ to extort money from companies who don’t bother paying the royalties and runtime license fees their licenses require.

As an free/open-source-software guy, I happily adopted ‘happy software prole’ as an absurd tagline, in the spirit of detournement. Fast-forward to 3.5 years on, however, and I’d say most people can’t even remember the Forbes article, or that Daniel Lyons guy! So that tagline was a bit old and busted, really.

On top of this, I’d noticed something I do in my weblog reading — I’ve started renaming blogs in the feed reader from their fancy title, to simply the name of the author.

I’ve found that when reading blogs, I’m interested in who’s writing. When skimming through the feeds of a morning, having to spend 5 seconds to recall that “ByteSurgery.com” is Robin Blandford is just a wee bit superfluous, sorry Robin. ;)

As a favour for readers, I’ve saved them the trouble, and renamed the blog to be quite explicit about who’s writing; the taint.org tagline is now just “taint.org: Justin Mason’s Weblog”. Let’s face it — it’s a bit functional. Hopefully it’s helpful, though!

(And finally, it gives me the edge in the ongoing Google war against the non-me “Justin Masons” out there… and against a heart surgeon and a Texan basketball player, I need it. ;)

Tags: , , , , ,

Comments (3)

Irish Blog Awards

A quick note; the Irish Blog Awards shortlisting votes are about to end later today. I’ve been nominated in the long list (thanks!), for best technology blog — feel free to vote for me if you like ;)

Update: boo, no shortlisting. Still, probably my own fault, I was a bit too wishy-washy with the vote hustling! Maybe next year…

Tags: , , , , ,

Comments (4)

Back in one piece

Well, I’m back in Dublin in one piece, after a great honeymoon in Corsica. Lots of stuff to catch up on, so if you’re waiting on a response, sorry, it might take a little longer…

Tags: , , ,

Comments (2)

Unblocked

I just found an error in an Apache config file for taint.org, resulting in some of the legacy RSS feed URLs producing invalid data — this meant that anyone subscribed to the Feedburner feed, for example, had been missing out on my witterings. Fixed now — apologies!

Tags: , , ,

Comments

A Little Downtime

Quick note: taint.org, and the other sites on the same host, will be down for somewhere between 30 minutes and an hour tomorrow, at 1000 UTC, as the host moves to a new datacenter (and a new IP address).

Handily, the host will also get a hefty RAM upgrade, which should improve matters the next time we get slashdotted ;)

(If you need to get in touch during the downtime, jmason at gmail dot com will be the best bet.)

Update: this is now complete.

Tags: , ,

Comments

Holidaze

Quick note — I’m off on vacation next week — so I probably won’t read any email while I’m there ;) Talk to you after the 17th.

Tags: , , , ,

Comments (1)

Retroactive Tagging With TagThe.Net

Hacky hack hack.

Ever since I enabled tags on taint.org, I’ve been mildly annoyed by the fact that there were thousands of older entries deprived of their folksonomic chunky goodness. A way to ‘retroactively tag’ those entries somehow would be cool.

Last week, Leonard posted a link on his linkblog to TagThe.net, a web service which offers a nifty REST API; simply upload a chunk of text, and it’ll suggest a few tags for that text, like this:

echo 'Hi there, I am a tag-suggesting robot' | curl "http://tagthe.net/api/?text=`urlencode`"
<?xml version="1.0" encoding="UTF-8"?>
<memes>
  <meme source="urn:memanage:BAD542FA4948D12800AA92A7FAD420A1" updated="Tue May 30 20:20:39 CEST 2006">
    <dim type="topic">
      <item>robot</item>
    </dim>
    <dim type="language">
      <item>english</item>
    </dim>
  </meme>
</memes>

This looked promising.

Anyway, I’ve now implemented this — it worked great! If you’re curious, here’s details of how I did it. It’s a bit hacky, since I’m only going to be doing this once — and very UNIXy and perlish, because that’s how I do these things — but maybe somebody will find it useful.

How I Retroactively Tagged taint.org

This weblog runs WordPress — so all the entries are stored in a MySQL database. I took the MySQL dump of the tables, and a quick script figured out that out of somewhere over 1600-ish posts, there were 1352 that came from the pre-tag era, requiring tag inference. A mail to the TagThe.Net team established that they were happy with this level of usage.

I grepped the post IDs and text out of the SQL dump, threw those into a text file using the simple format ‘id=NNN text=SQLHTMLSTRING’ (where SQLHTMLSTRING was the nicely-escaped HTML text taken directly from the SQL dump), and ran them through this script.

That rendered the first 2k of each of those entries as a URL-encoded string, invoked the REST API with that, got the XML output, and extracted the tags into another UNIXy text-format output file. (It also added one tag for the ‘proto-tag’ system I used in the early days, where the first word of the entry was a single tag-style category name.)

Next, I ran this script, which in turn took that intermediate output and converted it to valid PHP code, like so:

cat suggestedtags | ./taglist-to-php.pl  > addtags.php
scp addtags.php my.server:taint.org/wp-admin/

The generated page ‘addtags.php’ looks like this:

<?php
  require_once('admin.php');
  global $utw;
  $utw->SaveTags(997, array("music","all","audio","drm-free",
      "faq","lunchbox","destination","download","premiere","quote"));
  [...]
  $utw->SaveTags(998, array(”software”,”foo”,”swf”,”tin”,”vnc”));
  $utw->SaveTags(999, array(”oses”,”eek”,”longhorn”,”ram”,
    “winsupersite”,”windows”,”amount”,”base”,”dog”,”preview”,”system”));
?>

Once that page was in place, I just visited it in my (already logged in) web browser window, at http://taint.org/wp-admin/addtags.php, and watched as it gronked for a while. Eventually it stopped, and all those entries had been tagged. (If I wasn’t so hackish, I might have put in a little UI text here — but I didn’t.)

The results are very good, I think.

A success: http://taint.org/tag/research has picked up a lot of the interesting older entries where I discussed things like IBM’s Tieresias pattern-recognition algorithm. That’s spot on.

A minor downside: it’s not so good at nouns. This entry talks about Silicon Valley and geographical insularity, and mentions “Silicon Valley” prominently — one or both of those words would seem to be a good thing to tag with, but it missed them.

Still, that’s a minor issue — the tags it has suggested are generally very appropriate and useful.

Next, I need to find a way to auto-generate titles for the really old entries ;)

Tags: , , , , , ,

Comments (1)

5 Years of taint.org

Five years ago, on 15 May 2001, I started writing this weblog.

Subject matter started with a forward of something odd from the Forteana list – ‘Why Finns are sick of illnesses named after them’. In terms of subject matter, I started the weblog to reduce the amount of forwards I was passing on by email to other groups — hence the preponderance of forteana posts early on.

Nowadays, by contrast, I try to write original ramblings^Wresearch for the main part of the site, and the occasional “fresh bits” I unearth elsewhere are kept separate, posted to the link-blog at del.icio.us/jm.

However, the real reason I started the thing was to act as an experiment in using WebMake as a blog platform — at least, that was the excuse. It worked quite successfully, for what it’s worth — but in mid-August 2005, I eventually accepted that there weren’t enough hours in the day to maintain a weblogging CMS, and its templates, as well as everything else, and that I didn’t really need to test WebMake’s abilities any more, and switched to WordPress. I’m glad I did; WP is a great piece of software.

So what’s been the biggest hit on taint.org, by far? Here it is: http://taint.org/xfer/2004/kittens.jpg . Lots and lots of Google Image referrers, MySpace hotlinkers, etc. etc. ;) It’s a top hit for a GIS search for [kittens], I think.

Random stats, based on April’s logs:

  • About 81247 hits were received during April to the RSS 2.0 feed (the default), 9921 to the Atom feed, and 7795 for the RSS 1.0 rendering. That indicates that format-wars-wise, people just use the default. ;)
  • Assuming the RSS reader apps average out to 1 HTTP GET every 30 mins (as Bloglines and Apple’s reader do), that means there are somewhere around (98963 / (30 * 24 * 2)) = 68 subscribers.
  • In terms of the old style browser-using readership — there were 44926 hits on the front page using web browsers.
  • AWStats claims 2700 visits per day, from around 33000 visitors per month. I find the latter figure hard to believe.

After the front page and the feeds, the scraped RSS feeds at http://taint.org/scraped/ come second, Threadless beating out Perry Bible Fellowship by a little bit.

Top stories last month, based on hits:

  • http://taint.org/2006/04/29/230814a.html — Single-Letter Google Hits
  • http://taint.org/2006/01/20/220239a.html — the SweetheartsConnection.com Scam (still attracting comments from scammees!)
  • http://taint.org/2004/04/15/033025a.html — really outdated stats on GMail’s spam filtering accuracy
  • http://taint.org/2006/04/20/213624a.html — Automatically Invoking screen(1) on Remote Logins
  • http://taint.org/2006/04/15/134751a.html — Google Calendar
  • http://taint.org/2006/04/03/121837a.html — A Gotcha With perl’s “each()”
  • http://taint.org/2005/08/06/024026a.html — The Life of a SpamAssassin Rule
  • http://taint.org/2006/04/21/133432a.html — Phishing and Inept Banks
  • http://taint.org/2006/04/06/210519a.html — RSS Feeds for Events in Dublin
  • http://taint.org/2006/04/13/140841a.html — BT DSL’s Daily Disconnects

Technorati says there are 514 links from 105 sites. I still don’t know what the hell that means. ;)

Update: I’ve remembered that, before I started blogging at taint.org, I kept a diary at Advogato, which dates all the way back to March 2000!

Also, here are some pretty graphs from the graph-top-referers script:

The several slashdottings and a Boing Boinging are quite clear ;)

Tags: , , , , , ,

Comments (3)

RFID in the Grauniad, and back in Dublin

Greetings from sunny Dublin, Ireland! (really!)

I’m now back in taint.org’s native timezone, although precariously set up and experiencing occasional interruptions. If you’re waiting for a mail from me, it may take a little more time.

I did have time to be interviewed last week by Karlin Lillington for this Guardian story:

To make sure customs agents could read his cat’s chip to match him to his Pet Passport on return to Europe, Mason bought his own scanner at a cost of some £200. “I didn’t want to risk the cat being impounded for six months’ quarantine at Heathrow,” he sighs.

It’s true.

Happy to be back — I think. Looking forward to my first pints, in over a year, of creamy Guinness in its native habitat. I also have a couple of half-written weblog entries I wrote on the plane, too…

Tags: , , , , , , , ,

Comments (6)

Wisdom Teeth — Complete!

On Friday, I got my lower-left wisdom tooth extracted. That’s the last one that should cause any trouble; there’s only one remaining, and it’s fully out so shouldn’t act up. After a few years of on-again-off-again twinges, and lots of irresponsible putting-off of surgery, I’ve finally taken care of it.

The downside: I’m totally zonked on painkillers, so I won’t be doing much for the next few days apart from what’s required for day-to-day day-job stuff.

Tags: , , ,

Comments (5)

Running on WordPress!

I’ve decided to try out the real deal — a ‘proper’ weblogging platform, namely WordPress. Be sure to comment if you spot problems…

Tags: ,

Comments (17)

Grumpiness and Cigarettes

Meta: My apologies if you wound up running into me online at some stage this week — I’ve been in a lousy mood.

I gave up smoking cigarettes at the end of May, and switched to patches. That went pretty well, dropping from 21mg patches, to 14mg, to 7mg. But this week I finally hit the end of the line, stopped applying a patch every morning, and became fully nicotine-free. Only, ouch — it’s not quite as easy as I thought!

Cigarette addiction is (apparently) composed of two conceptual lumps – the physical addiction to nicotine, and the mental addiction to the ‘idea’ of smoking. Through the patches, I’ve successfully nailed the mental addiction, but I’m now facing the physical withdrawal. I’m sweating, dizzy, can’t focus my eyes, can’t concentrate, my skin is going crazy, and I’m INCREDIBLY grouchy. It’s amazing how much havoc the act of withholding nicotine can cause, especially when you consider that it’s not a required nutrient for the human body — it’s an ‘optional extra’ that I never should have gone near in the first place.

Wierdly, though, I don’t want a cigarette. Instead, I want a patch ;)

Tags: , , ,

Comments

Where I’d gotten to

Meta: You might have noticed things being a bit quite around here recently. Unfortunately, it wasn’t for good reasons.

A close family member in Ireland died suddenly on Good Friday. Once we found out, being in Death Valley (of all places) that weekend, we made a mad dash back home for the removal, funeral, and so on. The past two weeks have been not so much fun, all in all.

I’m torn between eulogising here, and keeping it offline. All in all, I think it’d be better to not use this weblog for that; I don’t think it’d be appropriate. But he’ll be greatly missed.

Tags: , , , , , , , , , ,

Comments

Back, in the flurry of a mini-tornado

Meta: Back. Not even ‘mini-tornados’ at Dublin Airport can keep me away — although it gave it a damn good try, with a 3 hour delay, a missed connection, and an overnight stay in Chicago. Arggh.

Mail: I generally leave the laptop at home when on vacation, to do some proper winding down. Not sure it was a great idea this time, since I was joe-jobbed by some pretty extensive spam runs recently, resulting in over 30,000 bounces sitting unread in my email when I got back.

Thankfully, Tim Jackson’s bogus-virus-warnings.cf SpamAssassin ruleset (with a few updates) got most of them, with only a few hundred getting past. I should really hack on making those more complete, but some of the bounces are really obscure; along the lines of ‘Hi from J Random Luser, Esq.! I no longer use this address because it gets too much spam! Please send to this new one instead: jrluser98@example.com!’, generally without any obvious identifying headers that indicate it’s an autoresponse.

Sigh — each of those messages is just utterly random, and I can’t see much recourse but to come up with some nasty phrase-based content filtering rules, which I was hoping to avoid. But 29,500 hits isn’t bad ;)

I’m not sure they’d be suitable yet for use as default SpamAssassin rules, since they now generally just match any kind of bounce message, not specifically joe-job or virus-forgery blowback. But that suits me just fine — I can live without bounces, as long as I don’t have to suffer the bounce blow-back.

Science: Good news from New Scientist — they’re opening up their archives! NS has consistently the best science journalism around, and I’ve been a subscriber for years. But until recently, they had a lousy approach to their website — most of the useful stuff, like the archives, were walled-off, subscriber-only features; a classic case of missing the Clue Train. Well, here’s an archive search for ’spam’ — pretty impressive, and most of the short articles are available in full, with only the full text for features and opinion pieces requiring a login.

In addition, they’ve added a massive batch of RSS feeds. Sadly, no full article text excerpts, however. But still — getting the clue, eventually — this way they may actually get links on the web, in place of the mangled and chinese-whispered versions of their articles republished in the UK newspapers…

Ireland: Due to monopolistic pricing of Irish GIS data, consumer GPS maps of Ireland’s road system are appalling, and this page collects a few great demos — for example, MS Autoroute quintuples the distance from Galway to Roundstone! That’s a major tourist route, BTW. I knew it was bad, but not that bad…

Anyway, I’m still waaay behind, but slowly catching up.

Tags: , , , , , , , , , ,

Comments

Xmas hols

Meta: I’m back in Dublin for a couple of weeks over xmas, so I won’t be updating this weblog very much. See you in January!

BTW I flew back via Chicago, which is obviously the stopover of choice to Dublin from Silicon Valley — surrounded by 1 iBook per every 8 passengers. ;)

PS: looks like they forgot Poland!

Tags: , , , , , , ,

Comments

Linkblogging

Meta: Yes, I’ve joined the lazy-sods club. Here’s my del.icio.us linkblog. Blame that luscious posting bookmarklet which just makes it sooo easy…

Tags: , , , , , ,

Comments

taint.org got NTK’d!

Meta: NTK this week linked to my closed-group filesharing roundup — thanks!

One I’d missed — Shinkuro. Looks very interesting, although pretty proprietary at a glance. It remains to be seen what their availability and prices will be like…

Tags: , , , , , , ,

Comments