Google Webmaster Tools now includes ‘goog-love.pl’

Back in 2006, I wrote a script I called “goog-love.pl”; it used Google’s now-dead SOAP search API (thanks, Nelson!) to figure out which Google queries your web site was “winning” on. Unfortunately, Google shut down new signups for the SOAP interface later that year.

I was just looking through Google’s Webmaster Tools page for taint.org, when I came across the Statistics / Top search queries page:

img

This is exactly what goog-love.pl produced. hooray!

Tags: , , , , ,

Comments

Google Calendar ‘Quick Add’ smart keyword bookmark

Google Calendar has a nifty feature, “Quick Add”, where you can enter a natural-language string like “lunch with Justin, 1pm 20/4/08″, it parses it, and adds an appointment to your calendar. However, the link in the Calendar UI can’t be bookmarked; you have to go to the Calendar page, wait for it to sloooowly load all its AJAX bits, hit the link, and only then type the appointment details, by which time I’ve forgotten it anyway ADD-style. ;)

Elias Torrez came up with a Firefox extension to use the Quick Add feature in one keypress, but in my opinion that’s overkill — I don’t want the overhead of another extension, the upgrade worries, and I don’t want it using up a keyboard shortcut either. I’d prefer to just have this as a Firefox Smart Keyword – and thankfully the trick is in the comments for his blog post, from someone called Bjorn. So here’s the deal:

Name: Google Calendar Quick Add

Location: http://www.google.com/calendar/event?ctext=+%s+&action=TEMPLATE&pprop=HowCreated%3AQUICKADD

Keyword: newcal

Description: add a new event in Google Calendar

enjoy!

Tags: , , , , , ,

Comments (4)

Google now include Code Search in normal results

Latest Google curiosity… I hadn’t spotted this before: it appears Google is now including ‘Code Snippet’ results in the results for its normal search. For example, a search for XSLoader gives this result:

xsloader

The results highlighted on the page are for a local variable in a Java module, rather than the much more common XSLoader perl module. I guess ‘Code Snippet’ search is case-sensitive.

Tags: , , ,

Comments (2)

Google’s CAPTCHA - not entirely broken after all?

A couple of weeks ago, WebSense posted this article with details of a spammer’s attack on Google’s CAPTCHA puzzle, using web services running on two centralized servers:

[...] It is observed that two separate hosts active on same domain are contacted during the entire process. These two hosts work collaboratively during the CAPTCHA break process. [...]

Why [use 2 hosts]? Because of variations included in the Google CAPTCHA image, chances are that host 1 may fail breaking the code. Hence, the spammers have a backup or second CAPTCHA-learning host 2 that tries to learn and break the CAPTCHA code. However, it is possible that spammers also use these two hosts to check the efficiency and accuracy of both hosts involved in breaking one CAPTCHA code at a time, with the ultimate goal of having a successful CAPTCHA breaking process.

To be specific, host 1 has a similar concept that was used to attack Live mail CAPTCHA. This involved extracting an image from a victim’s machine in the form of a bitmap file, bearing BM.. file headers and breaking the code. Host 2 uses an entirely different concept wherein the CAPTCHA image is broken into segments and then sent as a portable image / graphic file bearing PV..X file headers as requests. [...]

While it doesn’t say as such, some have read the post to mean that Google’s CAPTCHA has been solved algorithmically. I’m pretty sure this isn’t the case. Here’s why.

Firstly, the FAQ text that appears on “host 1″ (thanks Alex for the improved translation!):

img

FAQ

If you cannot recognize the image or if it doesn’t load (a black or empty image gets displayed), just press Enter.

Whatever happens, do not enter random characters!!!

If there is a delay in loading images, exit from your account, refresh the page, and log in again.

The system was tested in the following browsers: Internet Explorer Mozilla Firefox

Before each payment, recognized images are checked by the admin. We pay only for correctly recognized images!!!

Payment is made once per 24 hours. The minimum payment amount is $3. To request payment, send your request to the admin by ICQ. If the admin is free, your request will be processed within 10-15 minutes, and if he is busy, it will be processed as soon as possible.

If you have any problems (questions), ICQ the admin.

That reads to me a lot like instructions to human “CAPTCHA farmers”, working as a distributed team via a web interface.

Secondly, take a look at the timestamps in this packet trace:

img2

The interesting point is that there’s a 40-second gap between the invocation on “Captcha breaking host 1″ and the invocation on “Captcha breaking host 2″. There is then a short gap of 5 seconds before the invocations occur on the Gmail websites.

Here’s my theory: “host 1″ is a web service gateway, proxying for a farm of human CAPTCHA solvers. “host 2″, however, is an algorithm-driven server, with no humans involved. A human may take 40 seconds to solve a CAPTCHA, but pure code should be a lot speedier.

Interesting to note that they’re running both systems in parallel, on the same data. By doing this, the attackers can

  1. collect training data for a machine-learning algorithm (this is implied by the ‘do not enter random characters!’ warning from the FAQ — they don’t want useless training data)

  2. collect test cases for test-driven development of improvements to the algorithm

  3. measure success/failure rates of their algorithms, “live”, as the attack progresses

Worth noting this, too:

Observation*: On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%. The second algorithm (segmentation) has very poor performance that sometimes totally fails and returns garbage or incorrect answers.

So their algorithm is unreliable, and hasn’t yet caught up with the human farmers. Good news for Google — and for the CAPTCHA farmers of Romania ;)

Update: here’s the NYTimes’ take, with broadly agreeing comments from Brad Taylor of Google. (The Register coverage is off-base, however.)

Tags: , , , , , ,

Comments (2)

GNOME, Google and the UNIX user interface

Recently, after a flurry of annoying user interface issues, I’ve switched my RSS reader from Liferea to Google Reader. Interestingly, it turns out that Google Reader actually fits better with the traditional UNIX user interface concept, I’ve found.

What triggered this was an upgrade from Liferea 1.0.x to 1.4.4 as part of Ubuntu Gutsy; this brought with it a lot of changed behaviours, such as ‘drag-and-drop of feed URL to HTML view no longer subscribes’, and one crucial UI issue, ‘”Skim through articles” only works with ctrl+space’.

I’ve been a long-time UNIX user, dating back to the days where curses-based interfaces were the norm. As such, I tend to drive commonly-used applications using keyboard commands where possible. (This isn’t a purely UNIX thing; Windows has the phenomenon of the keyboard-wielding “power user”, too.)

Liferea was attractive, since it offered the ability to skim through articles quickly by just pressing the “Space” key; simply press space to page down, or to skip to the next unread article if at the end of the current one. Unfortunately, Liferea 1.4.x breaks this, and it wasn’t going to be fixed, since apparently a GNOME app shouldn’t behave this way:

GTK explicitely does implement as a key binding for several of it’s widgets. Rebinding means to break the default behaviour for such widgets (tree views, buttons, input fields). [....] Liferea as a web-browsing application should behave like any other web browser and like every other GNOME/GTK application as much as possible.

Now, I don’t know if it’s GNOME’s fault, or what, but for a UNIX desktop app to break with UNIX UI conventions, that’s a bad move in my opinion. I gave it a bit of argument in the bug tracker, but eventually gave up as I clearly wasn’t getting anywhere. :(

Instead, based on recommendation from friends, I gave Google Reader a try, and quickly figured out its extensive collection of keyboard shortcuts. Now, I’m skimming through my feeds in even less time than it took with Liferea, simply by hitting “ga” to go to my “all unread items” list, then “j”, “j”, “j” to skip through the postings one by one. Sweet!

It’s interesting to note that other Google web apps use the same concepts; Gmail also has a hefty set, and can be driven using them in a manner very reminiscent of the classic UNIX mailreader, Mutt. So, despite being designed with end-users in mind by extremely clever professional user experience designers, these apps still find space for power-user keyboard operation. Take note, GNOME.

Anyway, I’m not too bothered. Google Reader brings other benefits, such as fixing this bug: ‘please add ability to go to previous entry in Unread feed’, avoiding ‘constant memory leak requires daily restarts’, and, of course, the utility of being able to track the same set of feeds and keep track of which items I’ve read in two places (work and home).

If only it was open source ;)

Tags: , , , , , , , ,

Comments (4)

Spammers “giving up” according to Google

According to this Wired story, Google reckons spammers are giving up on spam:

a remarkable trend is underfoot, according to Brad Taylor, a staff software engineer at Google: The number of spam attempts — that is, the number of junk messages sent out by spammers — is flat, and may even be declining for the first time in years.

Actually, this is a wilful misunderstanding of what the Googler in question really said, which was that ‘attempts to spam Gmail users have been leveling off over the last year and more recently, even declining slightly’. In other words, they didn’t make an observation about the state of the spam problem on an internet-wide basis — just about the “local” situation as it pertains to Gmail. Bad reporting there, Wired.

But, in passing…

David Berlind at ZDNet recently blogged a rather grumpy response to InfoWorld coverage of CEAS 2007. He raised a very important point:

If I could say something to the author of that story, it would be that so long as any anti-spam solution is not deployed universally throughout the Internet’s e-mail system (in other words, so long as some anti-spam tech is not a standard), that anti-spam solution actually makes the spam problem worse. You read that right. Worse. Proprietary anti-spam solutions make the global spam problem worse. They are digging us deeper into the hole that the Internet is already in because everyone who makes those solutions is under the false belief that “s/he who is finally successful at filtering out all spam while allowing the legitimate mail in wins.”

Google’s blog post is a case in point: ‘we’re keeping more spam out of your inbox than ever before, so more and more, you can use Gmail for things you enjoy without even realizing that the spam filter is there most of the time.’

That’s great — but it doesn’t help anyone except Gmail. It’s a myopic view of the spam problem, and David’s point stands.

(I disagree with his later conclusion that the only way forward is for Google, MS, AOL and Yahoo! to get together and ‘commit to jointly supporting the same technical solutions’ — when the usual BigCos get together, they tend to focus on their own priorities. Take what happened back in 2005 with nofollow for blog-spam — while it helped the search giants with their own overriding priority, which was to tweak their algorithms to filter out the spam on the search results page, it did nothing to slow the spam flood itself, which has continued unabated.)

We need more open-source, and open-data, anti-spam work.

Tags: , , , , , , , , , ,

Comments (9)

Hog’s Chip

Hey Google –

Since Fido.ie is throwing errors at me, and since you’re probably a more searchable (and more global) database anyway — the Trovan FDX-B RFID transponder number 956000000659388 is that of “Hog Dempsey”, a small female black and white cat, whose owners can be contacted via any address on this page. Cheers!

Tags: , ,

Comments (3)

Wikipedia and rel=”nofollow”

Apparently, Wikipedia has (possibly temporarily) decided to re-add the rel=”nofollow” attribute to outbound links from their encyclopedia pages.

There’s been a lot of heat and light generated about this, most missing one thing: there’s no reason why Google needs to pay attention.

Google, or any other search engine, can treat links in the Wikipedia pages any way they like — including ignoring ‘nofollow’, applying extra anti-spam heuristics of their own, or even trusting the links more highly.

‘Nofollow’ has had pretty much no effect on web-spam, and now is generally festooned all over weblog posts across the internet, both spammed and non-spammed posts, at that. It’d be interesting to see if it’s yet flipped to mean a higher correlation with nonspam than spam content…

Update: It appears Wikipedia used ‘nofollow’ before, so this is not exactly new, either.

Tags: , , , , ,

Comments (2)

The vagaries of Google Image Search

Remember the C=64-izer, the quick hack to display an image in the style of the Commodore 64?

Recently, I’ve started getting hits to this demo image of the “O RLY?” owl — lots of ‘em.

It turns out that the C=64-ized rendition of this image is now the top hit for “O RLY” on Google Image Search; pretty bizarre, since there are obvious better images on the first search page, one result along in fact. What’s more, the page listed as the ‘origin page’, http://taint.org/tag/today, doesn’t even use that text.

This has resulted in lots of Myspace kiddies etc. obliviously using the C=64 rendering. Yay for Commodore ;)

Tags: , , , , , ,

Comments

a plug for Map24

Nat at O’Reilly Radar mentions that Multimap have added a public API . It’s great to see more sites adding public APIs, but sadly, as I note in a comment there, Multimap isn’t any use for me — they, along with Google and Yahoo!, have really crappy Irish mapping. Their geocoders (the part that turns an english-language address into a GIS coordinate pair) are pretty much non-functional for Ireland.

I moved from the US to Ireland earlier this year and found this pretty frustrating, after the joys of using the US mapping sites to get driving directions etc.

Thankfully, another contender has emerged recently — Map24.

They have a great geocoder for Ireland, and very reliable directions, which are even accurate for some of the more baroque one-way-system traffic-management changes that Dublin’s city planning department have come up with recently. The look and feel of the website is a little clunky in Firefox — not as smooth as Google’s — but it has some nice AJAXy touches now and seems to be heading in the right direction.

Interestingly, they now offer a public API for third-party mashups, and even offer an API for their geocoder — so someone preferring the Google look and feel could mash that up, using Map24 to find the coordinates and Google to display an area map! (Actually, I think that may be how John Handelaar’s earlier hack worked – I note in the comments that he mentions Map24 provide Lycos’ mapping backend. aha.)

Anyway — Map24 — if you’re looking for a good Irish mapping/driving-directions site, it’ll do the trick.

Tags: , , , , , , , ,

Comments

Searching GMail with a Firefox Smart Keyword

Here’s a Firefox Smart Keyword to search your GMail:

https://mail.google.com/mail/?search=query&view=tl&q=%s

Usage example, assuming you use ‘mail’ as the keyword: (CTRL-L) mail whatever

Tags: , , , ,

Comments (2)

SpamAssassin in the Google Summer of Code 2006

Are you a student, and interested in earning $4,500 for contributing to open source, and fighting spam, over the course of the summer?

If so, get thee hence to the Google Summer of Code 2006 site, and propose a project!

Last year, we in SpamAssassin didn’t get it together to mentor SoC projects. This year, however, we have a few prospective mentors (including myself), and a few sample project ideas lined up; we’re all ready to go! Here’s the Student FAQ. Be quick; applications end in a week and a bit.

Here’s hoping we get some interesting submissions ;)

Tags: , , , ,

Comments

Single-Letter Google Hits

Here’s what happens when you search for single letters on Google:

Interestingly I got to see the new Google search results page, with the sidebar, once. It must be in the process of rolling out…

Tags: , ,

Comments (8)

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

Tags: , , , , , , , ,

Comments (7)

Another script: goog-love.pl

A quick hack –

goog-love.pl - find out where your site’s google juice comes from

This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)

Tags: , , , , , ,

Comments (5)

Vint Cerf speaking at Google on Thursday

Heads-up, Dublin geeks: Vint Cerf will be speaking at the Dublin Googleplex on Thursday.

Sadly, I won’t be able to make it myself — I had to visit the UK this week. Pity; I would have loved to hear him speak :(

Tags: , , ,

Comments

Google DRM and WON Authentication

So, Google have invented their own DRM, apparently. I’m keen to find out more details; Techdirt and Plasticbag.org are so far the only places I can find in the blogosphere to discuss it in any detail.

One tidbit worth noting from the LA Times coverage:

The Google copy-protection software also imposes a big restriction: The CBS shows, NBA games and other material protected by the software can be watched only on a computer that’s connected to the Internet.

“I think it’s going to be a problem,” said Li, the Forrester analyst, adding that Google executives told her they were trying to fix it.

That’s interesting. In my opinion, given that quote, I’ll bet Google’s DRM is something similar to the copy-protection systems used for many games since about id’s Quake 3 and Valve’s Half-Life; an online “key server” which validates codes, tracks player IDs, and who’s viewing what, “live”, as the video is cued up and played.

Some more info on the Half-Life WON authentication system can be found in this GamaSutra article; subscription required — try viewing this google-cache version with Javascript off if you don’t have a sub. That’s historical now, of course, since that WON system has been replaced by a new auth protocol as part of Valve’s ‘Steam’ system.

The key factor is the network, separating the dangerous, untrustworthy user machine from the trusted key server. Since the online key server can act as a platform for trusted, known-insubvertable code to run, along with the video server, both being under Google’s control, it’s actually possible to build reasonably solid DRM on this model. That’s as opposed to the usual case, where a reasonably determined teenager can break it in a week of school-nights. ;)

Anyway, that’s speculation. It remains to be seen if they’ve come up with something along the lines of WON authentication — and if it’s still easily subvertable or not.

Update: Aristotle Pagaltzis has a pretty good point in the comments:

Watching video, unlike playing a multiplayer game, is not an activity that inherently requires connecting to a server. Playing a multiplayer game, OTOH, inherently is.

So cracking a multiplayer game’s key check is fruitless, because then you can’t play online anymore, which was the whole point of the game in the first place. In contrast, a video player with a cracked key check still fulfills its purpose just fine.

I think he’s right. That’s a key point, demonstrating how WON authentication still can’t help — media playback, as a task, is itself fundamentally crackable.

Tags: , , , , , , , ,

Comments (7)

Windows Live Local and Firefox

Windows Live Local, with its isometric, Sim City, “bird’s eye” view, is quite nice.

However, what gets me is — do MS do this deliberately? I’m referring, of course, to the way it’s broken on Firefox 1.5, requiring you to drag twice to get it scrolling around the viewport, and the jumpy, clunky UI on that browser.

Pretty lame — and lazy, too. By now, it’s essential for a new fancy website to work under Firefox; even if only 20% of your users will be using it, a good proportion of those are the bleeding-edge, ‘taste-maker’ types who’ll be blogging about it, writing reviews for newspapers and news sites, and generally generating buzz for you, and thereby attracting the other 80%.

I’m told it works great in IE, but there’s no way I’m starting Windows and opening up that app. If I want to be infected by 700 different malwares within seconds, I’ll ask. ;)

On top of that, coverage seems spotty — Ireland is AWOL, of course.

As a result, my one line summary would have to be: idea = cool, dataset = probably cool, execution = half-assed and crappy. I’m looking forward to Google doing a much better job with their implementation of the Sim City viewpoint.

Tags: , , , , ,

Comments (8)

Congressional Open URL Redirectors

Spam: Matthew Wilson at Boomer Consulting has been having a field day — it looks like some smart google hacking has thrown up some doozies of places that should have fixed this by now:

and my favourites:

Of course, all of these are immaterial to SpamAssassin — we catch spammers using them anyway. But still, a surprising number of these out there.

Tags: , , , , , , , , , ,

Comments

Spam and Broken Windows, and wecanstopspam.org

Spam: Spam Chongqing: Spamming Experiment:

Kasia at unix-girl.com decided to run a spamming experiment on her blog. She posted a couple spams to her own blog and waited to see what would happen. In less than 24 hours she received 356 more spams.

The chongqing guys confirm this, and I’ve noticed this as well (although just in passing, I’ve never tried testing it).

Interestingly, I’m pretty sure the same thing can happen with mailing lists, if the mailing list archives are allowed to contain the mailing list’s posting address, and the list allows open posting. It works like this:

  • spammer A posts a spam to the list
  • spam is archived
  • google finds archived spam
  • list-builders B, C, D google for search terms, find archive page for that mail message
  • B, C, D scrape the addresses from that page and pick up the list posting address
  • they then either sell on to spammers E, F, and G, who spam that address, or they spam the address themselves
  • and redo loop from the start.

One key factor is the search terms B, C, and D use. My theory is that they are intending to generate ‘targeted’ lists, and in spamming, most targeted lists are simply lists of addresses scraped from pages that show up in a google search for a specific keyword — ‘meds’, ‘viagra’, ‘degree’, etc.

Joe at chonqing surmises that it may be through the Broken Windows Theory — that spam appearing in a weblog’s comments, or in a wiki page, indicates that the administrator is asleep at the wheel and more spam can be posted with impunity. in my opinion, that’s probably more likely for google-spam and wiki-spam than for email spam, but undoubtedly is a factor.

PS: href=”http://chongq.blogspot.com/2005/04/another-spammer-owned-antispam-site.html”> wecanstopspam.org has been allowed to lapse and has been stolen by a spammer. Oh dear.

Tags: , , , , , , , , ,

Comments

Echo chamber goes crazy about ‘nofollow’

Blogs: Just to expand on a linkblog posting I made yesterday, Google’s search team have announced support for a new piece of Google functionality; they’ll fix their crawlers to ignore links with a rel="nofollow" attribute, for PageRank calculations, the idea being that spammers will stop blog-spamming once they can’t get PageRank out of it.

The blog world has been all aflutter:

BurningBird is right, to a degree. In fact, it’s been solved before.

Here’s a taint.org posting from November 2003 where I point out that by using a trivial Javascript URL one can link to another page without conferring PageRank. The format is:

javascript:document.location=target

The result looks like this, and work in any browser with a basic JS engine, from IE 3.02 and Netscape Navigator 2 onwards. I’ve been using it for my referrer logs, among other things, for over a year. I wrote a patch that implemented it for external links in the Moin Moin wiki software.

Amazingly, despite my plugging this idea at virtually every opportunity, it seems nobody noticed! At least, nobody among the people who (it would seem) should be looking into comment spam, thinking about how to deal with it, etc.

Disappointing — the echo chamber keeps talking to itself, once again. Maybe I’ll stick with dealing with email spam instead ;)

Ah, whatever. Anyway, this is a nicer fix; relying on JS isn’t a good thing. So nice work, Google.

(PS: worth noting that while this is a good plan, comment spam won’t be going away any time soon, as Mark Pilgrim noted. Still, here’s hoping it’ll help in the long term…)

Tags: , , , , , , , , , ,

Comments

playing around with Google Suggest

Web: Google Suggest, a drop-down list of suggestions — with hitrates! The one letter hits are interesting, too.

“spam” hitrates, the top 3 (aside from “spam” itself):

  • “spam filter”: 6,400,000 results
  • “spamcop”: 1,570,000
  • “spamassassin”: 1,350,000

in the top 3. getting there!

unfortunately, you have to get as far as “justin ma” before my name shows up, so not doing too great in that competition. ;)

Tags: , , , , , , , , , ,

Comments

Patents in an open source world

Patents: Newsforge: Patents in an open source world, by Lawrence Rosen (founding partner of Rosenlaw and Einschlag).

Interesting article, but I’m not sure summary point number 2 (’continue to document our own “prior art” to prevent others from patenting things they weren’t the first to invent’) really helps, when the patent examiners clearly haven’t performed the simplest Google check. I’ve found obvious prior art in 30 seconds, by plugging 3 words from patent claims into Google in the past (and yes, I have a reasonable idea how to read patent claims by now).

Point number 3 is interesting, since it contradicts most other advice I’ve read regarding patent searches: ‘Conduct a reasonably diligent search for patents we might infringe. At least search the portfolios of our major competitors. (This, by the way, is also a great way to make sure we’re aware of important technology advances by our competitors.) Maintain a commercially reasonable balance between doing nothing about patents and being obsessed with reviewing every one of them.’

However, this comment really is interesting and raises something major that I’d never heard of before — users of proprietary software can also face a significant risk from the patent threat. In particular, according to the linked comment, Microsoft licensed some patented technology from a company called Timeline Inc., but the license was not sublicenseable — in other words, it did not grant their customers the rights to fully use the technology! (in fairness to MS, this was established later in court.) Result: href=”http://trends.newsforge.com/comments.pl?sid=39443&cid=96153″>MS SQL server OEMs and ISVs are now being sued.

Tags: , , , , , , , , , ,

Comments

Hacking Netflix

Movies: Hacking Netflix, via torrez.

Jason Kottke points out a great quote on a Friendster cross-site scripting attack — this great quote: ‘We have a policy that we are not being hacked.’

He also speculates that Google used the GMail invite-network data for whitelisting — but whitelisting based on email address alone is trivially exploitable, so I’d doubt it.

I’m just back from a trip over to Cape Cod to meet family (halfway between here and Ireland, y’see ;) — lots and lots of luvverly lobster and sundry shellfish — and after a 6 day trip, had 5000 spams and a couple of thousand nonspam mails to deal with. Thankfully SpamAssassin dealt with the spams (only about 5 false negatives, no false positives I could spot) – but I’m going to have to do something about that volume of mail. drowning in the stuff. argh.

Tags: , , , , , , , , , ,

Comments

Microsoft 0wnz ‘http’

Web: Back in 2002, it occurred to someone to check the Google search results for ‘http’, to figure out what the most popular sites were.

Looks like it’s changed — here’s the top five results from a Google search for ‘http’ now:

  • 1: Microsoft
  • 2: AltaVista (!!)
  • 3: Yahoo!
  • 4: My Excite
  • 5: Google

My guess: older links are getting good PageRank, using whatever new tweaked algorithm they’re using. But AltaVista beating Google? ;)

Tags: , , , , , , , , , ,

Comments

Bloomsday!

Literature: Happy Bloomsday Centenary! Google agrees:

Google Bloomsday logo

You can have a read of Joyce’s masterpiece online at online-literature.com, although this is certainly one text that works better on paper, to be pored over and parsed slowly. But regardless of whether it’s readable on-screen or not, the legality of that copy is dubious, anyway.

As this Telegraph article notes, the copyright situation on Ulysses is, sadly, a total mess. Even 84 years after it was written, and promptly banned in the US, UK and Ireland for ‘obscenity’, Ulysses remains a thorny legal subject.

The novel was first published in 1922, and as such, fell into public domain in the UK in 1992, but was apparently ‘pulled back’ in 1996. According to this mail, due to recent copyright term extensions, the 1922 text will now remain in copyright in the EU until the end of 2011, and may not expire until 2032 in the US. And this Irish Times article notes that in Ireland, ‘copyright on Joyce’s works ran out on December 31st, 1991, 50 years after his death. However, EU regulations revived copyright from July 1995 when it extended the lifetime of copyright to 70 years.’

Reportedly, the Dail even had to pass emergency legislation last week to prevent an exhibition at Dublin’s National Library from being sued by the Joyce Estate:

The threat to the exhibition has been caused by the 2000 Copyright Act which creates a doubt about its ability to display manuscripts bought by the State because the Joyce estate still holds copyright.

Hilarious. Recent overzealous copyright extension legislation snares governments too! But they get to rewrite the laws in emergency session to fix it ;)

All very ironic, considering Ulysses’ structure was deliberately derived from The Odyssey in the first place.

Tags: , , , , , ,

Comments

Egosurfing images.google.com

Comments

GMail

Mail: Google announces new mail service. This is not an April Fool’s Day joke — just terrible timing. ;) It’s for real.

Diego has some good comments.

My thoughts:

  • Privacy: ‘we do not disclose your personally identifying information to third parties unless we believe we are required to do so by law or have a good faith belief that such access, preservation or disclosure is reasonably necessary to … (c) detect, prevent, or otherwise address fraud, security or technical issues (including, without limitation, the filtering of spam)’. They’re going to build one hell of a spam-filtering corpus this way ;)
  • A nice ToS clause: ‘Your Intellectual Property Rights. Google does not claim any ownership in any of the content, including any text, data, information, images, photographs, music, sound, video, or other material, that you upload, transmit or store in your Gmail account. We will not use any of your content for any purpose except to provide you with the Service.’

Tags: , , , , , , , , ,

Comments

The ‘Hog Bog’

Architecture: For reasons which I won’t go into here, I wound up doing a Google Image Search for ‘toilet’ which turned up a link to this page: Toilets of the World. However, he’s missing one very important variety: the world-famous Goan ‘Hog Bog’.

Here’s a tasteful pic of an expectant pig waiting for lunch (local mirror) — and then, if your stomach can take it, a rather more graphic account here. (warning: not safe for lunch)

Tags: , , , , , , , , , ,

Comments

Orkut Down for Tweakage

Social: orkut - under construction: ‘ Based on your suggestions, I’m taking orkut.com back to the lab for some fine-tuning and improvements. It will likely take a few days to finish them. None of your data will be lost and I should have some nice surprises for you when I bring it back online. I’ll email you when it’s ready and running again.’

Probably taken offline mainly to deal with this wee buglet ;)

Orkut.com is interesting on a few levels:

  • the Google link paid off massively. It has a lot more geek cred than it would have had otherwise (especially given the in-my-opinion fugly MSN-style design, and — ugh — .aspx URLs ;)

    As far as I can see, it’s not really Google-affiliated; just written by a Googler in his spare time. The Google names I know don’t seem to be in there, and no games of ‘Six Degrees of Sergei Brin’ are possible ;)

  • the invite-only startup gave it some good initial buzz.

But IMO it needs a few tweaks: the main one IMO is export. Friendster, Tribe.net et al all give the impression that they want to lock you in the trunk so they can ‘monetize’ your network, or something. If that’s the way it’ll work, great, it’s a toy, and that’s all they’re getting from me.

These things are just toys until I can get my data back out again in a machine-readable format (FOAF, RDF, etc.) I want to augment it with other social data; like an anti-spam web of trust based on who I know, and being able to graphviz my social network, dammit! ;)

Brian McCallister has a few more useful comments.

Puzzles: a UK crypto guy says the Voynich manuscript is gibberish and reckons he’s figured out how it was made. ‘They have shown that its various word, which appear regularly throughout the script, could have been created using table and grille techniques. The different syllables that make up words are written in columns, and a grille - a piece of cardboard with three squares cut out in a diagonal pattern - is slid along the columns. The three syllables exposed form a word. The grille is pushed along to expose three new syllables, and a new word is exposed.’

Spam: NY Times on the Spam Conf 2004.

Tags: , , , , , , , , , ,

Comments

Ma, Google won’t leave me alone

Bizarre: OK, OK, Google, I’m planning to! Geesh, all I wanted was a search engine, not health advice. They’re not even my ads!

Tags: , , , , ,