Google Webmaster Tools now includes ‘goog-love.pl’

Back in 2006, I wrote a script I called “goog-love.pl”; it used Google’s now-dead SOAP search API (thanks, Nelson!) to figure out which Google queries your web site was “winning” on. Unfortunately, Google shut down new signups for the SOAP interface later that year.

I was just looking through Google’s Webmaster Tools page for taint.org, when I came across the Statistics / Top search queries page:

img

This is exactly what goog-love.pl produced. hooray!

Tags: , , , , ,

Comments

Google now include Code Search in normal results

Latest Google curiosity… I hadn’t spotted this before: it appears Google is now including ‘Code Snippet’ results in the results for its normal search. For example, a search for XSLoader gives this result:

xsloader

The results highlighted on the page are for a local variable in a Java module, rather than the much more common XSLoader perl module. I guess ‘Code Snippet’ search is case-sensitive.

Tags: , , ,

Comments (2)

“What’s New” archaeology

jwz has, incredibly, resurrected home.mcom.com, the WWW site of the Mosaic Communications Corporation, as it was circa Oct 1994.

Edmund Roche-Kelly was kind enough to get in touch and note this link – http://home.mcom.com/home/whatsnew/whats_new_0993.html:

September 3, 1993

IONA Technologies (whose product, Orbix, is the first full and complete implementation of the Object Management Group’s Common Object Request Broker Architecture, or CORBA) is now running a Web server.

An online pamphlet on the Church of the SubGenius is now available.

Guess who was responsible for those two ;)

I was, indeed, running the IONA web server — it was set up in June 1993, and ran Plexus, a HTTP server written in Perl. IONA’s server was somewhere around public web server number 70, world-wide.

The SubGenius pamphlet is still intact, btw, although at a more modern, “hyplan”-less URL these days. It’ll be 15 years old in 6 months… how time flies!

Tags: , , , , ,

Comments (5)

Sharing, not consuming, news

The New York Times yesterday had a great article about modern news consumption:

According to interviews and recent surveys, younger voters tend to be not just consumers of news and current events but conduits as well — sending out e-mailed links and videos to friends and their social networks. And in turn, they rely on friends and online connections for news to come to them. In essence, they are replacing the professional filter — reading The Washington Post, clicking on CNN.com — with a social one.

“There are lots of times where I’ll read an interesting story online and send the URL to 10 friends,” said Lauren Wolfe, 25, the president of College Democrats of America. “I’d rather read an e-mail from a friend with an attached story than search through a newspaper to find the story.”

[Jane Buckingham, the founder of the Intelligence Group, a market research company] recalled conducting a focus group where one of her subjects, a college student, said, “If the news is that important, it will find me.”

In other words, as Techdirt put it, this generation of news readers now focuses on sharing the news, rather than just consuming it — and if you want to share a news story, there’s no point passing on a subscription-only URL that your friends and contacts cannot read.

What newspapers need to do to remain relevant for this generation of news consumers is not to hide their content behind paywalls and registration-required screens. The Guardian got their heads around this a few years back, and have come along in leaps and bounds since then. I wonder if the Irish Times is listening?

Tags: , , , , , , , ,

Comments

Google’s CAPTCHA - not entirely broken after all?

A couple of weeks ago, WebSense posted this article with details of a spammer’s attack on Google’s CAPTCHA puzzle, using web services running on two centralized servers:

[...] It is observed that two separate hosts active on same domain are contacted during the entire process. These two hosts work collaboratively during the CAPTCHA break process. [...]

Why [use 2 hosts]? Because of variations included in the Google CAPTCHA image, chances are that host 1 may fail breaking the code. Hence, the spammers have a backup or second CAPTCHA-learning host 2 that tries to learn and break the CAPTCHA code. However, it is possible that spammers also use these two hosts to check the efficiency and accuracy of both hosts involved in breaking one CAPTCHA code at a time, with the ultimate goal of having a successful CAPTCHA breaking process.

To be specific, host 1 has a similar concept that was used to attack Live mail CAPTCHA. This involved extracting an image from a victim’s machine in the form of a bitmap file, bearing BM.. file headers and breaking the code. Host 2 uses an entirely different concept wherein the CAPTCHA image is broken into segments and then sent as a portable image / graphic file bearing PV..X file headers as requests. [...]

While it doesn’t say as such, some have read the post to mean that Google’s CAPTCHA has been solved algorithmically. I’m pretty sure this isn’t the case. Here’s why.

Firstly, the FAQ text that appears on “host 1″ (thanks Alex for the improved translation!):

img

FAQ

If you cannot recognize the image or if it doesn’t load (a black or empty image gets displayed), just press Enter.

Whatever happens, do not enter random characters!!!

If there is a delay in loading images, exit from your account, refresh the page, and log in again.

The system was tested in the following browsers: Internet Explorer Mozilla Firefox

Before each payment, recognized images are checked by the admin. We pay only for correctly recognized images!!!

Payment is made once per 24 hours. The minimum payment amount is $3. To request payment, send your request to the admin by ICQ. If the admin is free, your request will be processed within 10-15 minutes, and if he is busy, it will be processed as soon as possible.

If you have any problems (questions), ICQ the admin.

That reads to me a lot like instructions to human “CAPTCHA farmers”, working as a distributed team via a web interface.

Secondly, take a look at the timestamps in this packet trace:

img2

The interesting point is that there’s a 40-second gap between the invocation on “Captcha breaking host 1″ and the invocation on “Captcha breaking host 2″. There is then a short gap of 5 seconds before the invocations occur on the Gmail websites.

Here’s my theory: “host 1″ is a web service gateway, proxying for a farm of human CAPTCHA solvers. “host 2″, however, is an algorithm-driven server, with no humans involved. A human may take 40 seconds to solve a CAPTCHA, but pure code should be a lot speedier.

Interesting to note that they’re running both systems in parallel, on the same data. By doing this, the attackers can

  1. collect training data for a machine-learning algorithm (this is implied by the ‘do not enter random characters!’ warning from the FAQ — they don’t want useless training data)

  2. collect test cases for test-driven development of improvements to the algorithm

  3. measure success/failure rates of their algorithms, “live”, as the attack progresses

Worth noting this, too:

Observation*: On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%. The second algorithm (segmentation) has very poor performance that sometimes totally fails and returns garbage or incorrect answers.

So their algorithm is unreliable, and hasn’t yet caught up with the human farmers. Good news for Google — and for the CAPTCHA farmers of Romania ;)

Update: here’s the NYTimes’ take, with broadly agreeing comments from Brad Taylor of Google. (The Register coverage is off-base, however.)

Tags: , , , , , ,

Comments (2)

post-Digging stats

The “cool hack to solve a maze using Photoshop” post got, in turn, posted to reddit, Make.zine, then Digg, then Waxy’s links, fazed.net, then Boing Boing, then StumbleUpon. Pretty popular!

If you’re interested, here’s my referrer graphs. Basically, Digg wins, in terms of quantity at least if not quality, with a massive 40,000 visits — there’s a real long tail in hits there…

Tags: , , , , , ,

Comments (4)

Moin Moin attachment spam

Here’s a new trick used by the web spammers — attachments on a Moin Moin wiki. The taint.org/wk RecentChanges list illustrates it well:

2007-05-07  set bookmark
[UPDATED]       UserPreferences         04:17   Info    ?StepStep [1-21]        
  #01 Upload of attachment ‘big-cocks.html’.
  #02 Upload of attachment ‘big-cock.html’.
  #03 Upload of attachment ‘big-boobs.html’.
  #04 Upload of attachment ‘big-ass.html’.
  #05 Upload of attachment ‘bdsm.html’.
  #06 Upload of attachment ‘bbw.html’.
  #07 Upload of attachment ‘bang-bros.html’.
  #08 Upload of attachment ‘bangbros.html’.
  #09 Upload of attachment ‘baby.html’.
  #10 Upload of attachment ‘asian-porn.html’.
  #11 Upload of attachment ‘asian-girls.html’.
  #12 Upload of attachment ‘anime-porn.html’.
  #13 Upload of attachment ‘anime-girls.html’.
  #14 Upload of attachment ‘angelina-jolie.html ‘.
  #15 Upload of attachment ‘amature.html’.
  #16 Upload of attachment ‘amatuer.html’.
  #17 Upload of attachment ‘adult-videos.html’.
  #18 Upload of attachment ‘adult-stories.html’ .
  #19 Upload of attachment ‘adult-games.html’.
  #20 Upload of attachment ‘69.html’.
  #21 Upload of attachment ‘3d.html’.

Great. Lots of spam. This first started appearing on Feb 27 2007, in a multi-upload attack on a single page (”FindPage”), from IP address 212.26.129.162; then reoccurred on Apr 27 and May 7 from the (insecure open proxy) proxy.drevlanka.ru.

Annoyingly my “subscribe to wiki changes” patch doesn’t catch this – these aren’t gatewayed through as “changes” via mail for review. I need to fix that in my copious free time. :(

Also, the RecentChanges RSS feed doesn’t list them, although the HTML form does.

So unfortunately, the only way I can see to block this is either to review by visiting the RecentChanges page in a web browser regularly (how retro!), and delete them retrospectively, or simply to turn off attachments entirely – which is what I’ve done, by editing “wikiconfig.py” and adding:

    actions_excluded = ['AttachFile']

It looks like quite a few other wikis around the web are running into the issue too :(

Tags: , , , , , ,

Comments (2)

HOWTO block editing of pages in Moin Moin

A useful Moin Moin anti-spam tip, via Upayavira at the ASF: adding ACLs to pages so that only certain users can edit them. This is an easy way to interfere with the wiki spammers who get past the existing (quite good) Moin Moin anti-spam subsystems. They tend to aim for the common Wiki pages, such as WikiSandBox, RecentChanges, and FrontPage, so if you make those pages uneditable, that’ll cause them more trouble — and hopefully cause them to move on to easier targets, instead of defacing your wiki. Here’s how to do it (at least for Moin Moin >= 1.5.1).

Open a shell on the machine where the Moin Moin software is installed. Edit your “wikiconfig.py” file (in my case this is at /home/moinmoin/moin-1.5.1/share/moin/jmwiki/wikiconfig.py), and change the “acl_rights_before” line to read:

    acl_rights_before = u"JustinMason:read,write,delete,revert,admin"

Replace “JustinMason” with your wiki login name, of course.

Create an administrative group of trusted users. Do this by creating a page called “AdminGroup” containing

#acl All:read
These are the members of this group, who can edit certain restricted pages:
 * JustinMason

Now, for the sensitive pages (like FrontPage etc.), edit each one and add an access-control list line at the top of each page containing:

#acl AdminGroup:read,write All:read

That’s it. Users who are not in the AdminGroup will no longer be able to edit those pages. That should help… at least for a while ;)

Update: you should also use this in wikiconfig.py:

    acl_rights_default = u'Known:read,write,revert All:read'

This blocks non-logged-in users from writing to pages.

Tags: , , , , , ,

Comments

Wikipedia and rel=”nofollow”

Apparently, Wikipedia has (possibly temporarily) decided to re-add the rel=”nofollow” attribute to outbound links from their encyclopedia pages.

There’s been a lot of heat and light generated about this, most missing one thing: there’s no reason why Google needs to pay attention.

Google, or any other search engine, can treat links in the Wikipedia pages any way they like — including ignoring ‘nofollow’, applying extra anti-spam heuristics of their own, or even trusting the links more highly.

‘Nofollow’ has had pretty much no effect on web-spam, and now is generally festooned all over weblog posts across the internet, both spammed and non-spammed posts, at that. It’d be interesting to see if it’s yet flipped to mean a higher correlation with nonspam than spam content…

Update: It appears Wikipedia used ‘nofollow’ before, so this is not exactly new, either.

Tags: , , , , ,

Comments (2)

Script: new-referrer-rss

new-referrer-rss.pl - generate RSS feed of new referrer URLs from access_log

SYNOPSIS

new-referrers-rss nameofsite [source ...] > new-referrers.xml

DESCRIPTION

Given the name of a web site, and a selection of Apache combined log format ‘access_log’ files containing referrer URL data, this will generate an RSS feed containing the latest referrers.

The script should be run periodically with ‘fresh’ access_log data, from cron.

Tags: , , , , ,

Comments (6)

Peoplefeeds and Quick Aggregation

peoplefeeds is cool.

I’ve been looking for something to can aggregate my Flickr, Wordpress blog, and del.icio.us feeds into one venue where I can look up items by tag, in a single page-load.

Suprglu was my leading contender, although they weren’t there yet since they didn’t seem to support importing my blog posts with tags preserved — pretty much everything wound up tagged as “uncategorized“. disappointing. :( so I was waiting for them to fix that.

This post by Richard MacManus pointed at another couple of options; 43Things and Peoplefeeds. I hadn’t actually noticed that 43Things was doing this kind of aggregation too; unfortunately as far as I can see, they doesn’t support tag preservation and browsing, so there goes my desired feature. shame.

However, Peoplefeeds was right on target, offering a ‘Unified Tagspace’ and a ‘Search All-Personal-Content’ mechanism. It works nicely, too. Here’s my personal aggregator, combining my Flickr feed, my weblog feed, and my del.icio.us feed into one — and with a unified tag-space; here’s my ‘hiking’ tag, hitting all 3 feeds. Perfect.

One other use for this — I’ve forgotten why I was looking for one of these, but I know I did want one ;) — it can be used to make a “private planet“. If you have 3 or 4 feeds that you need to combine into one, this provides a very easy way to do that; just set up a userid at Peoplefeeds for that purpose.

Tags: , , , , ,

Comments (7)

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

Tags: , , , , , , , ,

Comments (7)

Planet Antispam at abuse.net

Planet Antispam now has a better URL — http://planet.spam.abuse.net/ . Much better!

Tags: , , , , , ,

Comments (1)

Allowing users to have steak knives

This post on the Wikipedia/Seigenthaler spat at Corante.com contains this excellent comment from Wikipedia’s Jimmy Wales:

Imagine that we are designing a restaurant. This restuarant will serve steak. Because we are going to be serving steak, we will have steak knives for the customers. Because the customers will have steak knives, they might stab each other. Therefore, we conclude, we need to put each table into separate metal cages, to prevent the possibility of people stabbing each other.

What would such an approach do to our civil society? What does it do to human kindness, benevolence, and a positive sense of community?

When we reject this design for restaurants, and then when, inevitably, someone does get stabbed in a restaurant (it does happen), do we write long editorials to the papers complaining that “The steakhouse is inviting it by not only allowing irresponsible vandals to stab anyone they please, but by also providing the weapons”?

No, instead we acknowledge that the verb “to allow” does not apply in such a situation. A restaurant is not allowing something just because they haven”t taken measures to forcibly prevent it a priori. It is surely against the rules of the restaurant, and of course against the laws of society. Just. Like. Libel. If someone starts doing bad things in a restuarant, they are forcibly kicked out and, if it”s particularly bad, the law can be called. Just. Like. Wikipedia. I do not accept the spin that Wikipedia “allows anyone to write anything” just because we do not metaphysically prevent it by putting authors in cages.

Tags: , , , , ,

Comments (9)

Windows Live Local and Firefox

Windows Live Local, with its isometric, Sim City, “bird’s eye” view, is quite nice.

However, what gets me is — do MS do this deliberately? I’m referring, of course, to the way it’s broken on Firefox 1.5, requiring you to drag twice to get it scrolling around the viewport, and the jumpy, clunky UI on that browser.

Pretty lame — and lazy, too. By now, it’s essential for a new fancy website to work under Firefox; even if only 20% of your users will be using it, a good proportion of those are the bleeding-edge, ‘taste-maker’ types who’ll be blogging about it, writing reviews for newspapers and news sites, and generally generating buzz for you, and thereby attracting the other 80%.

I’m told it works great in IE, but there’s no way I’m starting Windows and opening up that app. If I want to be infected by 700 different malwares within seconds, I’ll ask. ;)

On top of that, coverage seems spotty — Ireland is AWOL, of course.

As a result, my one line summary would have to be: idea = cool, dataset = probably cool, execution = half-assed and crappy. I’m looking forward to Google doing a much better job with their implementation of the Sim City viewpoint.

Tags: , , , , ,

Comments (8)

Urban Dead HUD

I’ve been playing a bit of Urban Dead recently. Urban Dead is a very low-key, web-based MMORPG — you play a 3-minute turn once every 24 hours. It needs some rebalancing and some new features, especially given the organised nature of some of the bigger marauding zombie hordes, but I’m still finding it fun.

To scratch a couple of itches, I’ve written a Greasemonkey user script for UD called the Urban Dead HUD. It adds several nifty features to the user interface:

  • keyboard accelerator access keys for the action buttons, and your inventory — very handy when you’re attacking an enemy repeatedly;
  • an on-page long-distance map of the surrounding squares;
  • a distance tracker, which tracks the distances to “important” locations for you

There’s screenshots on the download page, so you can see what I’m talking about.

Greasemonkey is a fantastic tool, as is Mark Pilgrim’s Dive Into Greasemonkey, which has repeatedly turned out to be an excellent, well-written reference while hacking this. Thanks guys!

Tags: , , , , , ,

Comments (14)

Flickr as a ‘TypePad service for groups’

Web: a while back, I posted some musings about a web service to help authenticate users as members of a private group, similarly to how TypeKey authenticates users in general.

Well, Flickr have just posted this draft authentication API which does this very nicely — it now allows third-party web apps to authenticate against Flickr, TypeKey-style, and perform a limited subset of actions on the user’s behalf.

This means that using Flickr as a group authentication web service is now doable, as far as I can see…

Tags: , , , , , , , , ,

Comments

Dot-coms and geographical insularity

Web: i caught sight of (8 June 2005, Interconnected), on the geographical insularity of the dot-com boom. A good read:

The huge influx of cash at the turn of the millennium led to the whole Web being built in the image of the Bay area. The website patterns that started there and - just by coincidence - happened to scale to other environments, those were the ones that survived.

Lots to think about. He’s spot on, of course — many of the web’s big commercial success stories are almost shamelessly US-oriented, and if they work outside that, it’s purely by accident.

I’d love to see more web businesses that work well for other parts of the world, but that’ll take money — and from what I saw in Dublin, the money either (a) just isn’t there, or (b) frequently goes to the companies that talk the talk, but then piddle it away on ludicrous ‘e-business architectures’ and get nothing useful out the other end.

On both counts, Silicon Valley has an ace up its sleeve. The VCs are smart and well-funded, and the developers have experience, and know which tools are right for the job.

I’d be curious to hear how other high-tech hotspots in the US (Boston, for example) find this.

Tags: , , , , , , , , ,

Comments

IBM patents web transcoding proxies

Web: I link-blogged this, but it’s generated some email already, so it deserves a proper posting.

One thing you quickly learn about IBM where software patents are concerned, is that if IBM Research is making noise about a new software technique, they’ve probably patented it already. A few years ago, IBM was keen on HTTP transcoding — rewriting web content in a proxy, to be more suitable for display and access from less-capable devices, like PDAs and mobile phones.

So I probably should not have been surprised today when I came across USPTO patent 6,886,013, which is an IBM patent on a ‘HTTP caching proxy to filter and control display of data in a web browser’. It was applied for on Sep 11 1997, and finally granted on Apr 26 of this year.

The first claim covers:

  1. A method of controlling presentation on a client of a Web document formatted according to a markup language and supported on a server, the client including a browser and connectable to the server via a computer network, the method comprising the steps of:

    as the Web document is received on the client, parsing the Web document to identify formatting information;

    altering the formatting information to modify at least one display characteristic of the Web document; and

    passing the Web document to the browser for display.

Notice that there’s actually no mention of a HTTP proxy there — in other words, an in-browser rewriting element, such as Greasemonkey or Trixie may be covered by that claim. However, the claim does indicate that the document is passed from the ‘client’ to the ‘browser’, so perhaps having the ‘client’ inside the ‘browser’ evades that.

It appears this really wasn’t original research even when the patent was applied for — there’s probable prior art, even if the patent itself doesn’t cite it. For example, WWW4 in 1995 included Application-Specific Proxy Servers as HTTP Stream Transducers, which discusses ‘transduction’ of the HTTP traffic and gives an example of ‘A “rewriting” OreO (transducer element) that encapsulates each anchor inside the Netscape Blink extension, making anchors easier to spot on monochrome displays’. On top of that, Craig Hughes notes that his ’senior project at Stanford in 1992 was an implementation of a content-modifying HTTP proxy. It re-worked HTML in http streams to add some markup to enable full navigability through touch screen or voice control, for screen-only kiosks.’

Add this to the ever-growing list of over-broad software patents.

Tags: , , , , , , , , ,

Comments

More ways malware damages internet infrastructure: DNS servers

Malware: spotted on NANOG — Six PCs caused BigPond problems:

Disconnecting six compromised personal computers on Tuesday evening eased the difficulties caused by bogus requests which clogged BigPond’s domain name servers (DNS), slowing customer e-mail and Web site access, Telstra said.

A Telstra spokesperson said the carrier had narrowed the list of malware that could have infected the computers to three, adding the problem could have been caused by a combination of those viruses or Trojans. He declined to name the suspects.

He said the PCs generated 95 percent of the bogus requests which caused the problems that evening.

The ‘problems’ in question are described here :

One forum participant (on Aussie forum Whirlpool), who claimed to be a BigPond customer, said on Monday: ‘I’m in Canberra and it’s been almost unusable all afternoon. I’m snowed under at the moment and it is really driving me crazy. Three out of four links fail to load first time and sometimes take eight or nine tries before it does.’

Another said: ‘I am having problems loading Web pages, I get the 404 error. I have to retry five to 10 times to get some places.’

Petri Helenius, in a post to NANOG, notes:

Consumer ISP’s who don’t proactively take care of security/abuse usually end up with harvesting-bots which consume significant amount of DNS resources, typically doing anything from a few dozen to a thousand queries a second. A few hundred of these will seriously hamper an usually provisioned recursive server.

Interesting. It’s been a long time since I’ve relied on an ISP’s recursive DNS servers; in my recent experience (Comcast, Cox.net) they’ve always been overloaded, and take aaaages to give me answers. Maybe this is why.

It makes sense; most Windows machines will indeed use the ISP’s NSes, because that’s what DHCP tells you to do; and setting up a BIND or djbdns instance locally to query the roots directly is still a UNIX-only trick, as far as I know.

The upshot?

  • 1. Yet another good reason why ISPs should proactively disconnect infected customers, as they deny service to other users of the ISP.
  • 2. A good demonstration of yet another way the techie community’s experience of web surfing and internet use differs from that of the unwashed masses in the hinternet — that ’shanty-town of pop-ups and porn adware’, as Danny O’Brien puts it.
  • 3. Sometime soon, if it hasn’t happened already, someone’s going to bundle up an ‘Internet Accelerator’ lump of shareware that sets up a local recursive NS on Windows which queries the roots, and it’ll become the latest popular Windows download. Then the load on the root servers will really start rising.

(PS: top tip — ever wanted a publically-queriable recursive nameserver, or a good IP address for pinging, that’s easy to remember? 4.2.2.1 is what you’re after.)

Tags: , , , , , , , , , ,

Comments

Open API for online group-based services maintainance

Web: I’ve been doing a little thinking about group-based networking and services.

Here’s the situation. Let’s say you have a small group of people, and want to offer some kind of online service to them (like a private chat area, mailing list, etc. etc.) That’s all well and good, but maintainance of ‘who’s in the group’ is hard. You need:

  • the ability to let other ‘admins’ add/remove people
  • a nice UI for doing so
  • a nice UI for people to request to sign up
  • possibly, multiple groups
  • privacy for group members
  • possibly, some public groups
  • decent authentication, username/password
  • the usual stuff that goes with that — ‘I’ve forgotten my password, please email it to my listed address’
  • did I mention a nice UI?

The traditional approach is to code all that up myself, in my copious free time presumably. Urgh, talk about wheel reinvention on a massive scale.

I’d prefer to use something like TypeKey, a web service that exposes an API I can use to offload all this hard work to. Initially, I was in the ‘ugh, Typekey 0wnz my auth data’ camp, but I’ve eventually realised that (a) they’re not quite as evil as MS, (b) they’re not quite as stupid as MS (deleting Passport accounts if you don’t log in to Hotmail, which is only one of the supposedly many services, including third party services? hello?!), and (c) it’s actually really convenient having a single-sign-on for weblog commenting after all.

Having said all that — TypeKey’s out. Unfortunately, it only does authentication, without dealing with group maintainance.

However, social networking services are all about groups and group maintainance.

Running through the options — LinkedIn, Friendster and Orkut are all grabby and gropy and ‘my data! mine!’, so they’re out immediately.

The next step was to take a look at Tribe.net, which seems kind of nice and had a good rep for open APIs — but as far as I can see, all they’ve got really in that department is FOAF output, and a simple server-side-include thing called TribeCast. I could list all the group members in a FOAF file, but without authentication, that’s pretty useless since anyone could claim to be one of the FOAFs.

That leaves Flickr, which has a great set of APIs. Using that is looking quite promising. If you’re curious, I’ve gone into detail on this at the taint.org wiki.

Tags: , , , , , , , , , ,

Comments

Greasemonkey: transcoding extension for Firefox

Web: Now this is very cool stuff: ‘Greasemonkey is a Firefox extension which lets you to add bits of DHTML (”user scripts”) to any webpage to change it’s behavior.’

In other words, you can rewrite any page viewed in Firefox, as it transits between the server and your client’s display; a form of transcoding.

Traditionally, transcoding is performed using a HTTP proxy which applies the transformation, or a specialised HTTP user agent which transcodes and outputs a whole new set of documents with the results.

That was all a little hacky for full-scale integration into your web browser, though, so Greasemonkey is a big improvement for that use-case.

Some good links:

And some demos:

Remember, these are single, sub-100-line JS scripts, running entirely locally in the user’s web browser. The last one gives you an idea of what coolness is possible…

My contribution: an ad-removal script for Metafilter. It took some 30 seconds of hacking to produce this — soooo easy. It’s a whole new world of site customisation and hackable filtering. You thought AdBlock was good, this is ever niftier ;)

Tags: , , , , , , , , ,

Comments

Continuations in perl

Code: Ugo Cei: Building Interactive Web Programs with Continuations quoting Phil Windley:

This leads to the question: what if I could write programs for the Web that were ’structured’ in the programming sense of that word? The result would be Web programs that were more natural to write and easy to read. You’d no longer have to maintain the state of your program outside the language and the data could be kept in variables, where it belongs. The answer is: you can.

I hate the ’save all state’ model imposed by developing for the web, and have been hoping for a way to do this for a while — and now I know what it’s called ;)

It seems Seaside is the leading continuations-based web-app framework, using Smalltalk, and (as Ugo noted) Apache Cocoon has it too, but there’s a whole load more. Can you tell I haven’t been following web-app development techniques much recently?

Never mind those other languages, though — Continuity looks promising as a Perl framework based around continuations. Perl 6 will reportedly have native continuation support, and Dan Sugalski gives a good write-up of how they’re implemented and their ramifications there.

Tags: , , , , , , , , , ,

Comments

A Firefox Extension plug

Web: Urgh, I still have this damn cold I picked up in Ireland… sniffle cough etc. More vitamin C needed!

Anyway, just a quick plug for a very deserving Firefox extension, one I haven’t seen mentioned widely. It’s pretty common, when you wish to print out a web page, that you wish you could get rid of the obnoxious extra-wide sidebar tables, gigantic ads, or other extraneous parts of the page. Well, now you can:

Nuke Anything is a Mozilla/Firefox extension which offers two great features in the right-click context menu:

  • Remove this object: this will remove the object you’ve right-clicked on — a table TD, paragraphs, images, IFRAMEs, etc.
  • Remove selection: more usefully, this allows you to select exactly what you want to remove with a left-button drag, then right-click to remove it.

It’s really useful. I almost never print anything out these days without scrubbing off a few unwanted sidebars ;)

Tags: , , , , , , , , , ,

Comments

playing around with Google Suggest

Web: Google Suggest, a drop-down list of suggestions — with hitrates! The one letter hits are interesting, too.

“spam” hitrates, the top 3 (aside from “spam” itself):

  • “spam filter”: 6,400,000 results
  • “spamcop”: 1,570,000
  • “spamassassin”: 1,350,000

in the top 3. getting there!

unfortunately, you have to get as far as “justin ma” before my name shows up, so not doing too great in that competition. ;)

Tags: , , , , , , , , , ,

Comments

New Scientist’s psychic website

Web: The lovely C sent me a link of note — it’s the eglu, ‘the world’s most stylish and innovative chicken house and is the perfect way to keep chickens as pets’. (She has a thing about keeping chickens.)

So I was all set to link to that on NoMoreSocks.newscientist.com, New Scientist’s nifty new xmas-pressies site; but — get this: it will not load in Firefox 1.0PR, 1.0, or Konqueror at all — in fact, using telnet, the site doesn’t actually respond to requests on port 80 from my linux desktop.

The only browser it seems to work with is MS Internet Explorer in VMWare, presumably using MSIE’s psychic powers to contact it without going through TCP/IP.

Mysteriously, it can be lynxed from my server in Ireland, but similarly doesn’t work for C’s Firefox installation on her desktop. How wierd!

Tags: , , , , , , , , , ,

Comments

Indymedia cross-border takedown reaches Slashdot

Web: The slashdot story. The comments contain a massive amount of noise, but there are some highlights…

Some details of the backend; it appears Indymedia need more mirrors, and the imc-tech list and #tech channel are the best contact locations to get in touch. The comment also notes that the Mir CMS used by most IMCs generates static HTML — which is a good thing! I hereby withdraw my kvetching about server-side dynamic scripting in that case ;)

The techie who ‘had the contract with Rackspace’ comments, and provides a link to his weblog, which contains copies of the trouble tickets.

He also notes that the possible illegal posting was a newswire submission — therefore not ‘published’ per se, just uploaded in the same way an unmoderated-up slashdot comment is.

And finally — he notes that the EFF are offering to represent himself and Indymedia pro bono. Yay EFF!

The Electronic Frontier Foundation (EFF) is currently assisting Indymedia investigate possible responses to the seizure of its information. More than 20 Indymedia-related websites, along with Indymedia’s online radio, were hosted on the servers, which were dedicated machines provided by Rackspace.

‘This seizure has grave implications for free speech and privacy. The Constitution does not permit the government unilaterally to cut off the speech of an independent media outlet, especially without providing a reason or even allowing Indymedia the information necessary to contest the seizure,’ said EFF Staff Attorney Kurt Opsahl.

This is great news. Top-secret takedowns are not a good thing, especially when they span three national borders…

Tags: , , , , , , , , , ,

Comments

How to turn a stale project site into a useful Wiki

Web: Almost every project and organisation has, at some stage, bemoaned having stale data on their website, and wished there was a better way to keep it up to date; or wished their FAQ was more complete; or wished they had the time to HTML-ize all their know-how and get it up there.

Well, here’s what we did in SpamAssassin to deal with this problem. (Seeing as I’ve talked about this three times in the past month, I’ll write it up here so I can just point at the URL next time!)

First off, we experimented with having the site checked into CVS, FAQ-o-matic, and the Python FAQ software (which was pretty good). All were OK, but very specific in format, using the traditional question-answer FAQ layout — that’s good for FAQs, but not so good for a lot of other stuff — and keeping it updated was still limited to a small group, therefore the info got stale again.

So we moved to a Wiki. Here’s my tips for Wiki-izing your website so that the end results are better than what went in.

Use good wiki software: unusable software will be a pain to use, and the info will still go stale. We used Moin Moin - http://moin.sourceforge.net/ - partly because I like Python (it’s nearly perl! ;), it can produce RSS, and it was pretty easy to install.

Don’t worry: people won’t vandalise it (much). It turns out that vandalism and people throwing up crappy info isn’t a serious problem at all. You should increase the barrier, in the following ways:

Require user accounts: set the security policy so that a user account must be set up before editing is possible. This means you won’t get wiki-spammed, and also has the side effect of imposing a pretty big barrier to casual vandals.

Send changes to a list: set all changes to be mailed to a mailing list as diffs. This is the most important tip. If you already have a mailing list with the knowledgeable part of the community on it, use that list — because they’re the ones who’ll be able to recognise if erroneous info is put up, and will be annoyed about this enough to bother fixing it. There’s a bonus side-effect of this; even if some people didn’t like the wiki to start with, they’ll eventually be needled into using it by wanting to fix stuff they perceive as wrong. And then they get sucked in ;)

Use diff for the mailed changes: Moin by default will only send out change messages saying ’something changed on this page!’. That’s not good enough, unfortunately — you want to mail out what the new text looks like, and highlight exactly where the change happened. Moin can do this nicely, with this patch, which adds a mail_commits_address, where all diffs on every page are sent, using the normal diff mechanism.

Ensure the wiki software can revert quickly: If someone does make a bad change, Moin supports one-click reversion of the page to what it was beforehand. That’s great for dealing with spam, or clueless vandalism.

Keep one or two static pages: If you’re worried about some script kiddie thinking that defacing a wiki makes them look cool, then keep one or two of the primary user-facing pages as static data. For example, take a look at the link-bar at the top of http://spamassassin.apache.org/ ; five of the ten links are to static pages, the other five are now wiki-ized. In particular, our front page and our downloads page are both static, but our docs are predominantly Wiki’d.

Publicize Mozex: most techie groups will have techie users, and we hate using browser text-boxes to edit text. Mozex — http://mozex.mozdev.org/ — saves the day here — it’s a godsend.

Shepherd new changes: in the early stages, you want one or two people who tidy up changes from Wiki newbies, as they go in. They need to keep it looking pretty, and perform Refactoring of stuff that could be laid out better or should become multiple pages. Eventually, others will get the hang of that (and do a much better job than you do ;).

That’s the lot. Most of these are to, essentially, migrate aspects of your already-existing and already-working community into this new outlet. In our experience, it’s worked really well — our Wiki is now the most reliable source of info about SpamAssassin, and is extensive and up-to-date.

Tags: , , , , , , , , , ,

Comments

The Flickr Eye Thing

Web: Flickr’s latest trend — using just an eye (or similar minimalist face part) as your avatar pic:

Tags: , , , , , ,

Comments

Planetary Backgrounds now using Coral

Web: My Nearly-Live Planetary Desktop Backgrounds site is now using NYU’s Coral Content Distribution Network instead of FreeCache.org. (FreeCache wasn’t caching the files, because they were too small. drat.)

Coral is a ‘decentralized, self-organizing, peer-to-peer web-content distribution network’, using a distributed sloppy hash table and peer-to-peer DNS redirection infrastructure.

At least, apparently. ;) I haven’t read the papers yet, but what I do know is that so far, it seems to be working perfectly — each file is requested exactly once by the CDN servers:

  193.10.133.129 - - [31/Aug/2004:16:50:31 +0100] “GET
  /xplanet/tmp/200408311455.399750/day_clouds_800×600.png
  HTTP/1.1″ 200 706936 “-” “CoralWebPrx/0.1 (See
  http://www.scs.cs.nyu.edu/coral/)”

and never requested again. That’s a big saving… nifty!

Tags: , , , , , , , , , ,

Comments

Image Watermarking With ‘pamcomp’

Web: My Dad runs a couple of websites — his architectural photography business, and Andalucia Photo Gallery, a side project selling some lovely photos from the Andalusia region of Spain.

Needless to say, as the family geek, guess who coded all that up? Using WebMake, naturally ;) This was the main reason I wrote the ‘thumbnail_tag’ plugin.

You’ll note, however, that the image to right is watermarked, quite small, and encoded with a low quality setting. It turned out after a couple of years of operation, that the images were being downloaded and used in print all over the place — from both sites!

It seems photo piracy is rampant. Even with terms of use clearly linked on the sites, it’s still commonplace for print publications to swipe the images — and not just the little guys, either — some big commercial names have apparently used the images without asking (or paying licensing fees).

The Andalucia gallery site was a favourite; being a good hit for ‘travel photos spain’ meant lots of images being used for holiday pages in magazines, newspapers, and so on.

Needless to say, digital watermarking software doesn’t work — it’s trivial to load an image into Photoshop, resize or crop, and resave, apparently. Even if PS did respect the watermarks, netpbm doesn’t, and a watermarked image isn’t identifiable as such once it appears in print anyway! So we went for the blunt-tool approach, adding visible watermarks to the images.

It’s pretty easy — pamcomp allows you to overlay one image on top of another, using a third as an ‘alpha mask’ to control transparency. The results are pretty nice and not too intrusive.

It’s a shame it has to be done, though… :(

Tags: , , , , , , , , , ,

Comments

Doonesbury Bookmarklet

Web: in passing — here’s a bookmarklet for the current day’s Doonesbury comic strip: Today’s Doonesbury.

Tags: , , , , , ,

Comments