Skip to content

Month: March 2006

Feed43 Rocks

I’ve just given Feed43 a go. It’s very nifty.

Basically, it’s a pattern-based HTML-to-RSS scraper — similar to my own Sitescooper in that respect ;) — but built entirely as a web app.

Until now, I’ve been hacking up scrapers one by one, using either Sitescooper or WWW::Mechanize, run from cron, and putting the output up on taint.org; for example, http://taint.org/scraped/ has the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.

Today, I came across a case where I wanted a new RSS feed, and since I’d been hearing of Feed43, thought I’d give it a try, to save running yet another cron on our server. It was reasonably simple, although still required a fair bit of knowledge of the concepts of scraping via pattern matching against HTML; but the UI was fantastic, with everything previewed using a clean AJAX UI, and within 3 minutes I had a new feed.

For the curious — the feed was for TCAL’s Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL – Ireland. (go ahead and sign up if you like ;)

New web pattern, by the way — there’s a trend towards using “secret URLs” instead of username/password authentication for the kind of “trivial” auth task, like editing feed-scraper details. Good idea.

Public Transit == Crime

I just received a very nice info-pack through my front door regarding the new Dublin Metro line, which is in planning at the moment; it seems they’re soliciting feedback from residents near the proposed routes. Nicely done.

Right now, Dublin has an embarrassment of good public transit, at least when compared to my previous home in Orange County. There, public transit is actively campaigned against.

My favourite claim: that it ‘increases crime’ — in other words that poor people from Santa Ana would come down to Irvine and steal stuff, which they couldn’t do with vehicular transport, for some reason.

The OC Weekly thought it was pretty funny, too — and an opposing group comprehensively debunked it. Still, it seemed to work; while I was living in Irvine, I got to see the Centerline proposal gradually whittled down until it was finally killed off. During that time, in contrast, Dublin built the Luas.

Unfortunately it doesn’t exactly go where I want to go, but you can’t always have everything. ;)

DSL=GOT

finally!

Coffee and Trivia

Just got a new cafetiere, so I can finally switch back from instant coffee to the real deal again for my morning coffee. My productivity has doubled. Still no DSL, though — early next week is the current estimate, and I can hardly wait.

I went to a pub quiz last night with mates Macker, Tom and Alan — a benefit for a new Dublin theatre company, I think. The prizes were:

  • First prize: several 50 Euron vouchers for various Dublin eateries
  • Second prize: two fancy scarves, a Nivea women’s cosmetics kit, and a very metrosexual Nivea bath kit for a guy
  • Third prize: 4 bottles of nice wine

We did very nicely — “aglet” was correctly defined for instance — but not nicely enough. Put it this way: guess who’s wearing Nivea deodorant?

Buying Consumer Electronics Online, in Ireland?

Hey lazyweb, hear my plea! What are my options for buying consumer electronics online, now that I’m back in Ireland?

I like online shopping. I dislike Argos, and I really hate Dixons, Currys and all the rest of the consumer-electronics high-street operations. Get me on the net and out of the nasty little shops and I’m happy. ;)

All in all, I’m a bit of an Amazon fan. However, now that I’m back in Ireland, I’ve been brought back to earth with a bang on that count; the prices are OK for items at both Amazon.com and .co.uk — but shipping is turning out to be a total disaster.

Basically, I’ve put in two orders, paid through the nose for basic shipping, and neither has turned up. For example — I ordered this phone a week and a half ago, on the 9th March, ponying up UKP 27 for the item — and a painful UKP 7 for shipping by International Mail.

Delivery estimate on ordering was for between 5 and 7 days — 14th to the 16th March. That was long enough — but it still hasn’t turned up, and Amazon.co.uk is still claiming that that is the current estimate, despite the 16th of March being 4 days ago ;)

On top of that, it appears they don’t offer any way to track the packages using that shipping method, so who knows what’s happening with the damn thing right now.

If I compare that with an order I made at Amazon.com last November, in which I nabbed a handy FM transmitter for my iPod — in that case, I got it shipped by plain old US Postal Service for $4.51, which was handily discounted as Super Saver Shipping. That — as with pretty much all my Amazon.com orders — arrived in 3-4 days, and for a hell of a lot cheaper too. If I’d had to pay for shipping (which I didn’t anyway), $4.51 vs UKP 7 works out as a third of the price, no less.

I’m guessing this is mainly down to Amazon.co.uk being shoddy in terms of how it deals with shipping to Ireland, and there are probably sites that use better-quality shipping partners.

Surely there must be better deals with vendors in Ireland, or even elsewhere in the Eurozone? Anyone know? Please drop us a line in the comments!

Update: the items arrived — 14 days after ordering. This is a moot point now, though, since Amazon.co.uk are no longer selling ‘PC & Video Games, Toys & Games, Gift items, Electronics & Photo and Home & Garden items’ to Ireland; I guess it was easier to give up on the Irish market for now. Very disappointing — but I’m waiting to see what happens next.

VAST.com

So, my new employer just launched today!

It’s a new search service, VAST.com. As the blog says, ‘we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.’

Now you can see why I’m excited ;)

Greetings from 1996!

    --> Sending: ATZ
    ATZ
    OK
    --> Sending: ATQ0 V1 E1 S0=0 &C1 &D2
    ATQ0 V1 E1 S0=0 &C1 &D2
    OK
    --> Sending: ATH1
    ATH1
    OK
    --> Modem initialized.
    --> Sending: ATDT1892150150
    --> Waiting for carrier.
    ATDT1892150150
    CONNECT 45333

45 measly kilobits per second! This is incredibly painful — and expensive at 5 cents a minute! I briefly considered getting around it by hiring a 3G data-card for the couple of weeks before my DSL is activated — but that too is insanely overpriced.

Hurry up, DSL…

Disclosure

As of yesterday, I have a new day-job.

I won’t be working on email spam as part of the job, which is an interesting turn of events. However, I’ll be sticking with the open-source Apache SpamAssassin project, and keeping up the rate of work on that [*].

I’m not sure how much I can blog about the new place just yet, but I will say it’s certainly looking like it’ll be very interesting work ;)

[*: modulo the next couple of weeks while I’m waiting for my bloody DSL to be installed. argh!]

Apple Attempting to Patent RSS Aggregation

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple “IP”.

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here’s one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That’s pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, “inventors”!

It’s important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

We Win

ongoing: The ASF Server:

Tim Bray: Which Apache project burns the most resources?

Mads: Spamassassin by a wide margin. […]

Heh, we win ;)

Helios, the Zones server, has been an incredible resource for us. SpamAssassin isn’t a traditional open-source software project in one respect: we use a lot of centralized “phone home” infrastructure to support rule and score generation. Having a virtualized server of this quality and horsepower to use for this has been fantastic.

(thanks to John O’Shea for the pointer!)

IBM Patents Closed-Loop Confirmation

Another day, another absurd IBM software patent. Via the IP list, here’s United States Patent 7,003,497:

  1. A method for confirming an electronic transaction, comprising the steps of: performing an electronic transaction between a first party and a second party; providing, by the first party to the second party, contact information of a third party service provider associated with the first party; contacting, by the second party, the third party service provider to obtain a location of a predetermined, private mailbox associated with the first party; sending, by the second party, a request for confirmation of the electronic transaction to the predetermined, private mailbox associated with the first party; accessing the private mailbox by the first party; and sending, by the first party, a reply message to the request for confirmation to thereby confirm authorization of the electronic transaction, wherein information regarding the private mailbox is not communicated to the second party during the electronic transaction.

There’s lots of waffle in the background section about this being for electronic e-commerce transactions, but that claim, and claims 2 and 3 at least, are easily sufficiently broad to cover simple “confirmed opt-in” email subscription systems — in other words, the system whereby a potential newsletter subscriber clicks on a link in order to “confirm” that they want to subscribe to a newsletter. That’s the current best practice email subscription method used by pretty much everyone.

Filed December 31, 2001. There was plenty of prior art before this date, but who would want to go up against IBM, no less, to attempt to get this invalidated, especially now that it’s been issued?

Thanks USPTO, you’re doing a heck of a job!

US Things I Miss

So, I’ve been back in Ireland for several weeks now. How goes the culture shock? Well, let’s make a list of the stuff I’m missing from California:

  • C, who’s still back there finishing up her contract. Hurry up, C!

  • All my friends I left behind in the US :( Come visit!

  • The weather (well duh)

  • Trader Joes: low-cost, high-quality organic and near-organic food

  • The excellent Mexican and Southern food. Mmm, Taco Mesa

  • Super-cheap cocktails — although having good Guinness makes up for a lot of this

  • The back country — desert, mountains, snow, national parks. Ireland may have more surviving history dotted about, but it’s just flat. I miss the mountains

  • Netflix — haven’t spotted a replacement for this yet. There are companies in Ireland that use a similar idea, but it appears every one just about manages to screw it up and render it useless, generally by introducing throttling, late fees, or slow turnaround. meh

  • The way my Irish accent meant I could get away with pretty much anything. That trick doesn’t work in Ireland ;)

In other news: the broadband choices situation has pretty much gone to shit.

It turns out that all the good options are quite dependent on local-loop unbundling, which — somehow — still hasn’t gotten around to my local exchange. As a result, guess who’s going to be stuck on the wrong end of dialup, no less, for “2 to 3 weeks” until Eircom deign to switch on the bitstream access for my new BT-resold ADSL connection? Here’s hoping there’s a neighbour with broadband and wifi when I move back in. Joy.

DearAOL and GoodMail

Things have really been heating up recently around the AOL/Goodmail “pay to send” CertifiedMail scheme — the EFF and a host of other groups have launched dearaol.com, stating:

This system would create a two-tiered Internet in which affluent mass emailers could pay AOL a fee that amounts to an “email tax” for every email sent, in return for a guarantee that such messages would bypass spam filters and go directly to AOL members’ inboxes. Those who did not pay the “email tax” would increasingly be left behind with unreliable service. Your customers expect that your first obligation is to deliver all of their wanted mail, and this plan is a step away from that obligation.

While I dislike this proposal, too, as far as I can tell, AOL actually have pretty reasonable intentions with this program — nowhere near as bad as the DearAOL.com site makes out.

However, they’re doing a really really crappy job of getting this information out there, or committing to reasonable limits on the program, such as announcing that they will use it only for transactional emails, as Yahoo! have done.

I’d strongly recommend reading Carl Hutzler’s posting on the subject. Carl was AOL’s head of anti-spam operations until last year, so he really knows what he’s talking about, and he lays it out clearly — a lot more clearly than any corporate statements from AOL do. His blog contains a fair bit more on the subject, too.

But seriously — why isn’t there a press release on the AOL site about this scheme? Some front-channel communication about now might be useful, I’d suggest, before things really get hairy — this crapstorm is coming about partly because AOL’s comments are all filtering out in drips and drabs via third parties, and (AOLers say) are being misconstrued and misrepresented in the process. It’s a classic case of missing the cluetrain.

I’d also really encourage the EFF people to tone done the rhetoric; statements like “senders will have no guarantee that their emails will be delivered” is scare-mongering, given that SMTP email already provides no such guarantee.

Update: wow, MoveOn went really overboard — “threatening the Internet as we know it … The very existence of online civic participation and the free Internet as we know it are under attack.” OMG the sky is falling!

Side Issue: The Spam Definition

Also, another note to EFF: defining spam as “whatever you don’t want to read” is a terrible mistake to make. That confuses a good, clear, enforceable and automatable definition of spam — unsolicited bulk email — and makes it effectively unenforceable by law, unpoliceable by ISPs, impossible to detect automatically, and incompatible with existing, effective EU and Australian legislation.

Listen to your own Chairman of the Board; he’s right on this count.

PS: any luck fixing up the non-confirmed signups issue? Last time I checked I could still subscribe any address to the EFF Action Alerts without a cross-check, which is not a good thing.

Another script: goog-love.pl

A quick hack —

goog-love.pl – find out where your site’s google juice comes from

This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)