Lest we forget

Regarding Google Wave’s similarity to Lotus Notes, which is a meme I’ve heard from several angles — David Jones hits the nail on the head:

Well, I used Notes from 1994 to 1999. It did have a database backend for e-mail and a rich collaborative editing model. But it didn’t have realtime shared editing, or instant annotation.

And it was shit. No-one in their right minds would have wanted the future of the web to have been Notes. Even though, and I completely agree, it did things that the web is now only just getting round to.

+1 to that!

Tags: , , , , , ,

Comments (4)

User script: add my delicious search results to Google

For years now, I’ve been collecting bookmarks at delicious.com/jm — nearly 7000 of them by now. I’ve been scrupulous about tagging and describing each one, so they’re eminently searchable, too. I’ve frequently found this to be a very useful personal reference resource.

I was quite pleased to come across the Delicious Search Results on Google Greasemonkey userscript, accordingly. It intercepts Google searches, adding Delicious tag-search results at the top of the search page, and works pretty well. Unfortunately though, that searches all of delicious, not specifically my own bookmarks.

So here’s a quick hack fix to do just that:

my_delicious_search_results.user.js – My Delicious Search Results on Google

Shows tag-search results from my Delicious account on Google search pages, with links to more extensive Delicious searches. Use ‘User Script Commands‘ -> ‘Set Delicious Username‘ to specify your username.

Screenshot:

Enjoy!

Tags: , , , , , ,

Comments (2)

Angry GAA Examiner

hahaha. a lovely Google AI “doh” moment:

Needless to say, “Angry GAA Fans” is not a recurring section on the Irish Examiner’s site

Tags: , , , ,

Comments (4)

Google Reader productivity hack: change your Home

So, if you use Google Reader, read your news with the “All items” page, and are subscribed to hundreds of feeds, it can be pretty overwhelming. I’ve found a better way to deal with this.

Select a ‘most important’ subset of feeds. For each of those, click through to the feed details page, hit the “Feed Settings…” menu, and select “Change folders…“. Put the feed into a new “top” folder (creating it if necessary).

Now go to “Settings” -> “Preferences” and check out the “Start page” preference. By default, it’s set to “Home“; change it to “Folders and Tags: top“.

Hey presto — now, when you load Google Reader, it’ll come up with your “top” items. You can get through those quickly enough, and get on to other more important tasks. When you’re bored and need something to read, though, just hit “Navigation” -> “All items” (or even just type ‘ga’), and every other feed is now there for your delectation. Sweet!

Tags: , , , , , , , , ,

Comments (2)

Fixing the Gmail Tasks window bug

Hey Gmail users! If you’re using Tasks, there’s a slightly annoying bug in Gmail right now — you may see the “Use this link to open Tasks” tip window appear every time you access the inbox page.

Several other people have reported it, and apparently the Google guys are ‘working to resolve it’ at the moment. In the meantime, though, here’s a way to work around the issue without losing Tasks (you will, unfortunately, lose the offline-gmail functionality, though). Simply disable Offline Gmail (Settings -> Offline -> “Disable Offline Gmail for this computer”), and the bug no longer manifests itself.

You can allow Gmail to keep the stored mail on your computer if you like, which will be handy for when the bug is fixed and Offline can be re-enabled — hopefully sooner rather than later.

Tags: , , , , , ,

Comments (2)

Google.ie HTTPS fail

Check out what happens when you visit https://www.google.ie/ :

Clicking through Firefox’s ridiculous hoops gets me these dialogs:

Good work, Google and Firefox respectively!

Tags: , , , , , , , ,

Comments (3)

Links for 2008-09-24

Tags: , , , , , , , , , ,

Comments

Links for 2008-09-21

Tags: , , , , ,

Comments

Links for 2008-09-16

Tags: , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-09-12

Tags: , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-09-03

Comments

Links for 2008-08-15

Tags: , , , , , , , , , , , , , , ,

Comments (1)

AppEngine — only useful for toys

Noted on Twitter:

simonw: So apparently http://www.news.com.au/ used json-time for their Beijing countdown widget and blew my App Engine quota! They’ve stopped now.

uh, great. That’s useful.

Google — how are we supposed to host useful services with those limits?

Tags: , , , , , ,

Comments (3)

Links for 2008-08-01

TechCrunch UK campaigning for a “Digital Hub” I have to say, the Digital Hub is actually a great place to work; it’s well worth duplicating, if such a thing is possible

419eater anti-scammers fool 419ers into performing the Dead Parrot sketch “Possibly, he is pining for the fee-ords”

Google taking action against Nigerian/419 fraud spammers Good news. About time, too ;)

Tags: , , , , , , , , , , , , , , , ,

Comments

More details on the “GMail forwarding hole”

Those INSERT guys who’ve been talking about a GMail security hole allowing spammers to relay spam, have released more previous-redacted details here. (thanks to the MailChannels blog for pointing that out.)

In essence, the attack works by allowing a spammer to set the “forward to” address in GMail to point at a target address, send a spam to the GMail account, then change the “forward to” address to the next target and repeat.

My response:

  1. it’d be trivial for Google to impose stringent rate limits on “forward to” address changes, and I’d be surprised if they haven’t already.

  2. ditto rate-limiting on the rate of forwarding messages for each GMail account.

  3. as they say in the paper — if Google required up-front confirmation of the target address before forwarding any mail, that would also cut this out neatly.

  4. It’s worth noting that GMail’s outbound servers may be whitelisted by some recipient sites, others are treating them negatively — word on the anti-spam “street” is that GMail is becoming a festering pit of 419 scammers these days.

Tags: , , , ,

Comments (10)

Google Webmaster Tools now includes ‘goog-love.pl’

Back in 2006, I wrote a script I called “goog-love.pl”; it used Google’s now-dead SOAP search API (thanks, Nelson!) to figure out which Google queries your web site was “winning” on. Unfortunately, Google shut down new signups for the SOAP interface later that year.

I was just looking through Google’s Webmaster Tools page for taint.org, when I came across the Statistics / Top search queries page:

img

This is exactly what goog-love.pl produced. hooray!

Tags: , , , , ,

Comments

Google Calendar ‘Quick Add’ smart keyword bookmark

Google Calendar has a nifty feature, “Quick Add”, where you can enter a natural-language string like “lunch with Justin, 1pm 20/4/08″, it parses it, and adds an appointment to your calendar. However, the link in the Calendar UI can’t be bookmarked; you have to go to the Calendar page, wait for it to sloooowly load all its AJAX bits, hit the link, and only then type the appointment details, by which time I’ve forgotten it anyway ADD-style. ;)

Elias Torrez came up with a Firefox extension to use the Quick Add feature in one keypress, but in my opinion that’s overkill — I don’t want the overhead of another extension, the upgrade worries, and I don’t want it using up a keyboard shortcut either. I’d prefer to just have this as a Firefox Smart Keyword – and thankfully the trick is in the comments for his blog post, from someone called Bjorn. So here’s the deal:

Name: Google Calendar Quick Add

Location: http://www.google.com/calendar/event?ctext=+%s+&action=TEMPLATE&pprop=HowCreated%3AQUICKADD

Keyword: newcal

Description: add a new event in Google Calendar

enjoy!

Tags: , , , , , ,

Comments (8)

Google now include Code Search in normal results

Latest Google curiosity… I hadn’t spotted this before: it appears Google is now including ‘Code Snippet’ results in the results for its normal search. For example, a search for XSLoader gives this result:

xsloader

The results highlighted on the page are for a local variable in a Java module, rather than the much more common XSLoader perl module. I guess ‘Code Snippet’ search is case-sensitive.

Tags: , , ,

Comments (2)

Google’s CAPTCHA – not entirely broken after all?

A couple of weeks ago, WebSense posted this article with details of a spammer’s attack on Google’s CAPTCHA puzzle, using web services running on two centralized servers:

[...] It is observed that two separate hosts active on same domain are contacted during the entire process. These two hosts work collaboratively during the CAPTCHA break process. [...]

Why [use 2 hosts]? Because of variations included in the Google CAPTCHA image, chances are that host 1 may fail breaking the code. Hence, the spammers have a backup or second CAPTCHA-learning host 2 that tries to learn and break the CAPTCHA code. However, it is possible that spammers also use these two hosts to check the efficiency and accuracy of both hosts involved in breaking one CAPTCHA code at a time, with the ultimate goal of having a successful CAPTCHA breaking process.

To be specific, host 1 has a similar concept that was used to attack Live mail CAPTCHA. This involved extracting an image from a victim’s machine in the form of a bitmap file, bearing BM.. file headers and breaking the code. Host 2 uses an entirely different concept wherein the CAPTCHA image is broken into segments and then sent as a portable image / graphic file bearing PV..X file headers as requests. [...]

While it doesn’t say as such, some have read the post to mean that Google’s CAPTCHA has been solved algorithmically. I’m pretty sure this isn’t the case. Here’s why.

Firstly, the FAQ text that appears on “host 1″ (thanks Alex for the improved translation!):

img

FAQ

If you cannot recognize the image or if it doesn’t load (a black or empty image gets displayed), just press Enter.

Whatever happens, do not enter random characters!!!

If there is a delay in loading images, exit from your account, refresh the page, and log in again.

The system was tested in the following browsers: Internet Explorer Mozilla Firefox

Before each payment, recognized images are checked by the admin. We pay only for correctly recognized images!!!

Payment is made once per 24 hours. The minimum payment amount is $3. To request payment, send your request to the admin by ICQ. If the admin is free, your request will be processed within 10-15 minutes, and if he is busy, it will be processed as soon as possible.

If you have any problems (questions), ICQ the admin.

That reads to me a lot like instructions to human “CAPTCHA farmers”, working as a distributed team via a web interface.

Secondly, take a look at the timestamps in this packet trace:

img2

The interesting point is that there’s a 40-second gap between the invocation on “Captcha breaking host 1″ and the invocation on “Captcha breaking host 2″. There is then a short gap of 5 seconds before the invocations occur on the Gmail websites.

Here’s my theory: “host 1″ is a web service gateway, proxying for a farm of human CAPTCHA solvers. “host 2″, however, is an algorithm-driven server, with no humans involved. A human may take 40 seconds to solve a CAPTCHA, but pure code should be a lot speedier.

Interesting to note that they’re running both systems in parallel, on the same data. By doing this, the attackers can

  1. collect training data for a machine-learning algorithm (this is implied by the ‘do not enter random characters!’ warning from the FAQ — they don’t want useless training data)

  2. collect test cases for test-driven development of improvements to the algorithm

  3. measure success/failure rates of their algorithms, “live”, as the attack progresses

Worth noting this, too:

Observation*: On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%. The second algorithm (segmentation) has very poor performance that sometimes totally fails and returns garbage or incorrect answers.

So their algorithm is unreliable, and hasn’t yet caught up with the human farmers. Good news for Google — and for the CAPTCHA farmers of Romania ;)

Update: here’s the NYTimes’ take, with broadly agreeing comments from Brad Taylor of Google. (The Register coverage is off-base, however.)

Tags: , , , , , ,

Comments (5)

GNOME, Google and the UNIX user interface

Recently, after a flurry of annoying user interface issues, I’ve switched my RSS reader from Liferea to Google Reader. Interestingly, it turns out that Google Reader actually fits better with the traditional UNIX user interface concept, I’ve found.

What triggered this was an upgrade from Liferea 1.0.x to 1.4.4 as part of Ubuntu Gutsy; this brought with it a lot of changed behaviours, such as ‘drag-and-drop of feed URL to HTML view no longer subscribes’, and one crucial UI issue, ‘”Skim through articles” only works with ctrl+space’.

I’ve been a long-time UNIX user, dating back to the days where curses-based interfaces were the norm. As such, I tend to drive commonly-used applications using keyboard commands where possible. (This isn’t a purely UNIX thing; Windows has the phenomenon of the keyboard-wielding “power user”, too.)

Liferea was attractive, since it offered the ability to skim through articles quickly by just pressing the “Space” key; simply press space to page down, or to skip to the next unread article if at the end of the current one. Unfortunately, Liferea 1.4.x breaks this, and it wasn’t going to be fixed, since apparently a GNOME app shouldn’t behave this way:

GTK explicitely does implement as a key binding for several of it’s widgets. Rebinding means to break the default behaviour for such widgets (tree views, buttons, input fields). [....] Liferea as a web-browsing application should behave like any other web browser and like every other GNOME/GTK application as much as possible.

Now, I don’t know if it’s GNOME’s fault, or what, but for a UNIX desktop app to break with UNIX UI conventions, that’s a bad move in my opinion. I gave it a bit of argument in the bug tracker, but eventually gave up as I clearly wasn’t getting anywhere. :(

Instead, based on recommendation from friends, I gave Google Reader a try, and quickly figured out its extensive collection of keyboard shortcuts. Now, I’m skimming through my feeds in even less time than it took with Liferea, simply by hitting “ga” to go to my “all unread items” list, then “j”, “j”, “j” to skip through the postings one by one. Sweet!

It’s interesting to note that other Google web apps use the same concepts; Gmail also has a hefty set, and can be driven using them in a manner very reminiscent of the classic UNIX mailreader, Mutt. So, despite being designed with end-users in mind by extremely clever professional user experience designers, these apps still find space for power-user keyboard operation. Take note, GNOME.

Anyway, I’m not too bothered. Google Reader brings other benefits, such as fixing this bug: ‘please add ability to go to previous entry in Unread feed’, avoiding ‘constant memory leak requires daily restarts’, and, of course, the utility of being able to track the same set of feeds and keep track of which items I’ve read in two places (work and home).

If only it was open source ;)

Tags: , , , , , , , ,

Comments (4)

Spammers “giving up” according to Google

According to this Wired story, Google reckons spammers are giving up on spam:

a remarkable trend is underfoot, according to Brad Taylor, a staff software engineer at Google: The number of spam attempts — that is, the number of junk messages sent out by spammers — is flat, and may even be declining for the first time in years.

Actually, this is a wilful misunderstanding of what the Googler in question really said, which was that ‘attempts to spam Gmail users have been leveling off over the last year and more recently, even declining slightly’. In other words, they didn’t make an observation about the state of the spam problem on an internet-wide basis — just about the “local” situation as it pertains to Gmail. Bad reporting there, Wired.

But, in passing…

David Berlind at ZDNet recently blogged a rather grumpy response to InfoWorld coverage of CEAS 2007. He raised a very important point:

If I could say something to the author of that story, it would be that so long as any anti-spam solution is not deployed universally throughout the Internet’s e-mail system (in other words, so long as some anti-spam tech is not a standard), that anti-spam solution actually makes the spam problem worse. You read that right. Worse. Proprietary anti-spam solutions make the global spam problem worse. They are digging us deeper into the hole that the Internet is already in because everyone who makes those solutions is under the false belief that “s/he who is finally successful at filtering out all spam while allowing the legitimate mail in wins.”

Google’s blog post is a case in point: ‘we’re keeping more spam out of your inbox than ever before, so more and more, you can use Gmail for things you enjoy without even realizing that the spam filter is there most of the time.’

That’s great — but it doesn’t help anyone except Gmail. It’s a myopic view of the spam problem, and David’s point stands.

(I disagree with his later conclusion that the only way forward is for Google, MS, AOL and Yahoo! to get together and ‘commit to jointly supporting the same technical solutions’ — when the usual BigCos get together, they tend to focus on their own priorities. Take what happened back in 2005 with nofollow for blog-spam — while it helped the search giants with their own overriding priority, which was to tweak their algorithms to filter out the spam on the search results page, it did nothing to slow the spam flood itself, which has continued unabated.)

We need more open-source, and open-data, anti-spam work.

Tags: , , , , , , , , , ,

Comments (9)

Hog’s Chip

Hey Google –

Since Fido.ie is throwing errors at me, and since you’re probably a more searchable (and more global) database anyway — the Trovan FDX-B RFID transponder number 956000000659388 is that of “Hog Dempsey”, a small female black and white cat, whose owners can be contacted via any address on this page. Cheers!

Tags: , ,

Comments (3)

Wikipedia and rel=”nofollow”

Apparently, Wikipedia has (possibly temporarily) decided to re-add the rel=”nofollow” attribute to outbound links from their encyclopedia pages.

There’s been a lot of heat and light generated about this, most missing one thing: there’s no reason why Google needs to pay attention.

Google, or any other search engine, can treat links in the Wikipedia pages any way they like — including ignoring ‘nofollow’, applying extra anti-spam heuristics of their own, or even trusting the links more highly.

‘Nofollow’ has had pretty much no effect on web-spam, and now is generally festooned all over weblog posts across the internet, both spammed and non-spammed posts, at that. It’d be interesting to see if it’s yet flipped to mean a higher correlation with nonspam than spam content…

Update: It appears Wikipedia used ‘nofollow’ before, so this is not exactly new, either.

Tags: , , , , ,

Comments (2)

The vagaries of Google Image Search

Remember the C=64-izer, the quick hack to display an image in the style of the Commodore 64?

Recently, I’ve started getting hits to this demo image of the “O RLY?” owl — lots of ‘em.

It turns out that the C=64-ized rendition of this image is now the top hit for “O RLY” on Google Image Search; pretty bizarre, since there are obvious better images on the first search page, one result along in fact. What’s more, the page listed as the ‘origin page’, http://taint.org/tag/today, doesn’t even use that text.

This has resulted in lots of Myspace kiddies etc. obliviously using the C=64 rendering. Yay for Commodore ;)

Tags: , , , , , ,

Comments

a plug for Map24

Nat at O’Reilly Radar mentions that Multimap have added a public API . It’s great to see more sites adding public APIs, but sadly, as I note in a comment there, Multimap isn’t any use for me — they, along with Google and Yahoo!, have really crappy Irish mapping. Their geocoders (the part that turns an english-language address into a GIS coordinate pair) are pretty much non-functional for Ireland.

I moved from the US to Ireland earlier this year and found this pretty frustrating, after the joys of using the US mapping sites to get driving directions etc.

Thankfully, another contender has emerged recently — Map24.

They have a great geocoder for Ireland, and very reliable directions, which are even accurate for some of the more baroque one-way-system traffic-management changes that Dublin’s city planning department have come up with recently. The look and feel of the website is a little clunky in Firefox — not as smooth as Google’s — but it has some nice AJAXy touches now and seems to be heading in the right direction.

Interestingly, they now offer a public API for third-party mashups, and even offer an API for their geocoder — so someone preferring the Google look and feel could mash that up, using Map24 to find the coordinates and Google to display an area map! (Actually, I think that may be how John Handelaar’s earlier hack worked – I note in the comments that he mentions Map24 provide Lycos’ mapping backend. aha.)

Anyway — Map24 — if you’re looking for a good Irish mapping/driving-directions site, it’ll do the trick.

Tags: , , , , , , , ,

Comments

Searching GMail with a Firefox Smart Keyword

Here’s a Firefox Smart Keyword to search your GMail:

https://mail.google.com/mail/?search=query&view=tl&q=%s

Usage example, assuming you use ‘mail’ as the keyword: (CTRL-L) mail whatever

Tags: , , , ,

Comments (2)

SpamAssassin in the Google Summer of Code 2006

Are you a student, and interested in earning $4,500 for contributing to open source, and fighting spam, over the course of the summer?

If so, get thee hence to the Google Summer of Code 2006 site, and propose a project!

Last year, we in SpamAssassin didn’t get it together to mentor SoC projects. This year, however, we have a few prospective mentors (including myself), and a few sample project ideas lined up; we’re all ready to go! Here’s the Student FAQ. Be quick; applications end in a week and a bit.

Here’s hoping we get some interesting submissions ;)

Tags: , , , ,

Comments

Single-Letter Google Hits

Here’s what happens when you search for single letters on Google:

Interestingly I got to see the new Google search results page, with the sidebar, once. It must be in the process of rolling out…

Tags: , ,

Comments (8)

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

Tags: , , , , , , , ,

Comments (7)

Another script: goog-love.pl

A quick hack –

goog-love.pl – find out where your site’s google juice comes from

This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)

Tags: , , , , , ,

Comments (5)

Vint Cerf speaking at Google on Thursday

Heads-up, Dublin geeks: Vint Cerf will be speaking at the Dublin Googleplex on Thursday.

Sadly, I won’t be able to make it myself — I had to visit the UK this week. Pity; I would have loved to hear him speak :(

Tags: , , ,

Comments

Google DRM and WON Authentication

So, Google have invented their own DRM, apparently. I’m keen to find out more details; Techdirt and Plasticbag.org are so far the only places I can find in the blogosphere to discuss it in any detail.

One tidbit worth noting from the LA Times coverage:

The Google copy-protection software also imposes a big restriction: The CBS shows, NBA games and other material protected by the software can be watched only on a computer that’s connected to the Internet.

“I think it’s going to be a problem,” said Li, the Forrester analyst, adding that Google executives told her they were trying to fix it.

That’s interesting. In my opinion, given that quote, I’ll bet Google’s DRM is something similar to the copy-protection systems used for many games since about id’s Quake 3 and Valve’s Half-Life; an online “key server” which validates codes, tracks player IDs, and who’s viewing what, “live”, as the video is cued up and played.

Some more info on the Half-Life WON authentication system can be found in this GamaSutra article; subscription required — try viewing this google-cache version with Javascript off if you don’t have a sub. That’s historical now, of course, since that WON system has been replaced by a new auth protocol as part of Valve’s ‘Steam’ system.

The key factor is the network, separating the dangerous, untrustworthy user machine from the trusted key server. Since the online key server can act as a platform for trusted, known-insubvertable code to run, along with the video server, both being under Google’s control, it’s actually possible to build reasonably solid DRM on this model. That’s as opposed to the usual case, where a reasonably determined teenager can break it in a week of school-nights. ;)

Anyway, that’s speculation. It remains to be seen if they’ve come up with something along the lines of WON authentication — and if it’s still easily subvertable or not.

Update: Aristotle Pagaltzis has a pretty good point in the comments:

Watching video, unlike playing a multiplayer game, is not an activity that inherently requires connecting to a server. Playing a multiplayer game, OTOH, inherently is.

So cracking a multiplayer game’s key check is fruitless, because then you can’t play online anymore, which was the whole point of the game in the first place. In contrast, a video player with a cracked key check still fulfills its purpose just fine.

I think he’s right. That’s a key point, demonstrating how WON authentication still can’t help — media playback, as a task, is itself fundamentally crackable.

Tags: , , , , , , , ,

Comments (7)

Windows Live Local and Firefox

Windows Live Local, with its isometric, Sim City, “bird’s eye” view, is quite nice.

However, what gets me is — do MS do this deliberately? I’m referring, of course, to the way it’s broken on Firefox 1.5, requiring you to drag twice to get it scrolling around the viewport, and the jumpy, clunky UI on that browser.

Pretty lame — and lazy, too. By now, it’s essential for a new fancy website to work under Firefox; even if only 20% of your users will be using it, a good proportion of those are the bleeding-edge, ‘taste-maker’ types who’ll be blogging about it, writing reviews for newspapers and news sites, and generally generating buzz for you, and thereby attracting the other 80%.

I’m told it works great in IE, but there’s no way I’m starting Windows and opening up that app. If I want to be infected by 700 different malwares within seconds, I’ll ask. ;)

On top of that, coverage seems spotty — Ireland is AWOL, of course.

As a result, my one line summary would have to be: idea = cool, dataset = probably cool, execution = half-assed and crappy. I’m looking forward to Google doing a much better job with their implementation of the Sim City viewpoint.

Tags: , , , , ,

Comments (8)

Congressional Open URL Redirectors

Spam: Matthew Wilson at Boomer Consulting has been having a field day — it looks like some smart google hacking has thrown up some doozies of places that should have fixed this by now:

and my favourites:

Of course, all of these are immaterial to SpamAssassin — we catch spammers using them anyway. But still, a surprising number of these out there.

Tags: , , , , , , , , , ,

Comments

Spam and Broken Windows, and wecanstopspam.org

Spam: Spam Chongqing: Spamming Experiment:

Kasia at unix-girl.com decided to run a spamming experiment on her blog. She posted a couple spams to her own blog and waited to see what would happen. In less than 24 hours she received 356 more spams.

The chongqing guys confirm this, and I’ve noticed this as well (although just in passing, I’ve never tried testing it).

Interestingly, I’m pretty sure the same thing can happen with mailing lists, if the mailing list archives are allowed to contain the mailing list’s posting address, and the list allows open posting. It works like this:

  • spammer A posts a spam to the list
  • spam is archived
  • google finds archived spam
  • list-builders B, C, D google for search terms, find archive page for that mail message
  • B, C, D scrape the addresses from that page and pick up the list posting address
  • they then either sell on to spammers E, F, and G, who spam that address, or they spam the address themselves
  • and redo loop from the start.

One key factor is the search terms B, C, and D use. My theory is that they are intending to generate ‘targeted’ lists, and in spamming, most targeted lists are simply lists of addresses scraped from pages that show up in a google search for a specific keyword — ‘meds’, ‘viagra’, ‘degree’, etc.

Joe at chonqing surmises that it may be through the Broken Windows Theory — that spam appearing in a weblog’s comments, or in a wiki page, indicates that the administrator is asleep at the wheel and more spam can be posted with impunity. in my opinion, that’s probably more likely for google-spam and wiki-spam than for email spam, but undoubtedly is a factor.

PS: href=”http://chongq.blogspot.com/2005/04/another-spammer-owned-antispam-site.html”> wecanstopspam.org has been allowed to lapse and has been stolen by a spammer. Oh dear.

Tags: , , , , , , , , ,

Comments

Echo chamber goes crazy about ‘nofollow’

Blogs: Just to expand on a linkblog posting I made yesterday, Google’s search team have announced support for a new piece of Google functionality; they’ll fix their crawlers to ignore links with a rel="nofollow" attribute, for PageRank calculations, the idea being that spammers will stop blog-spamming once they can’t get PageRank out of it.

The blog world has been all aflutter:

BurningBird is right, to a degree. In fact, it’s been solved before.

Here’s a taint.org posting from November 2003 where I point out that by using a trivial Javascript URL one can link to another page without conferring PageRank. The format is:

javascript:document.location=target

The result looks like this, and work in any browser with a basic JS engine, from IE 3.02 and Netscape Navigator 2 onwards. I’ve been using it for my referrer logs, among other things, for over a year. I wrote a patch that implemented it for external links in the Moin Moin wiki software.

Amazingly, despite my plugging this idea at virtually every opportunity, it seems nobody noticed! At least, nobody among the people who (it would seem) should be looking into comment spam, thinking about how to deal with it, etc.

Disappointing — the echo chamber keeps talking to itself, once again. Maybe I’ll stick with dealing with email spam instead ;)

Ah, whatever. Anyway, this is a nicer fix; relying on JS isn’t a good thing. So nice work, Google.

(PS: worth noting that while this is a good plan, comment spam won’t be going away any time soon, as Mark Pilgrim noted. Still, here’s hoping it’ll help in the long term…)

Tags: , , , , , , , , , ,

Comments

playing around with Google Suggest

Web: Google Suggest, a drop-down list of suggestions — with hitrates! The one letter hits are interesting, too.

“spam” hitrates, the top 3 (aside from “spam” itself):

  • “spam filter”: 6,400,000 results
  • “spamcop”: 1,570,000
  • “spamassassin”: 1,350,000

in the top 3. getting there!

unfortunately, you have to get as far as “justin ma” before my name shows up, so not doing too great in that competition. ;)

Tags: , , , , , , , , , ,

Comments

Patents in an open source world

Patents: Newsforge: Patents in an open source world, by Lawrence Rosen (founding partner of Rosenlaw and Einschlag).

Interesting article, but I’m not sure summary point number 2 (’continue to document our own “prior art” to prevent others from patenting things they weren’t the first to invent’) really helps, when the patent examiners clearly haven’t performed the simplest Google check. I’ve found obvious prior art in 30 seconds, by plugging 3 words from patent claims into Google in the past (and yes, I have a reasonable idea how to read patent claims by now).

Point number 3 is interesting, since it contradicts most other advice I’ve read regarding patent searches: ‘Conduct a reasonably diligent search for patents we might infringe. At least search the portfolios of our major competitors. (This, by the way, is also a great way to make sure we’re aware of important technology advances by our competitors.) Maintain a commercially reasonable balance between doing nothing about patents and being obsessed with reviewing every one of them.’

However, this comment really is interesting and raises something major that I’d never heard of before — users of proprietary software can also face a significant risk from the patent threat. In particular, according to the linked comment, Microsoft licensed some patented technology from a company called Timeline Inc., but the license was not sublicenseable — in other words, it did not grant their customers the rights to fully use the technology! (in fairness to MS, this was established later in court.) Result: href=”http://trends.newsforge.com/comments.pl?sid=39443&cid=96153″>MS SQL server OEMs and ISVs are now being sued.

Tags: , , , , , , , , , ,

Comments

Hacking Netflix

Movies: Hacking Netflix, via torrez.

Jason Kottke points out a great quote on a Friendster cross-site scripting attack — this great quote: ‘We have a policy that we are not being hacked.’

He also speculates that Google used the GMail invite-network data for whitelisting — but whitelisting based on email address alone is trivially exploitable, so I’d doubt it.

I’m just back from a trip over to Cape Cod to meet family (halfway between here and Ireland, y’see ;) — lots and lots of luvverly lobster and sundry shellfish — and after a 6 day trip, had 5000 spams and a couple of thousand nonspam mails to deal with. Thankfully SpamAssassin dealt with the spams (only about 5 false negatives, no false positives I could spot) – but I’m going to have to do something about that volume of mail. drowning in the stuff. argh.

Tags: , , , , , , , , , ,

Comments

Microsoft 0wnz ‘http’

Web: Back in 2002, it occurred to someone to check the Google search results for ‘http’, to figure out what the most popular sites were.

Looks like it’s changed — here’s the top five results from a Google search for ‘http’ now:

  • 1: Microsoft
  • 2: AltaVista (!!)
  • 3: Yahoo!
  • 4: My Excite
  • 5: Google

My guess: older links are getting good PageRank, using whatever new tweaked algorithm they’re using. But AltaVista beating Google? ;)

Tags: , , , , , , , , , ,

Comments

« Previous entries Next Page » Next Page »