Full-text RSS bookmarklet

This site offers a nifty utility for dealing with those annoying sites which offer only partial text content in their RSS and Atom feeds.

Given an RSS or Atom feed’s URL, the CGI will iterate through the posts in the feed, scrape the full text of each post from its HTML page, and re-generate a new RSS feed containing the full text.

The one thing it’s missing is a one-click bookmarklet version. So here it is:

Full-text RSS Bookmarklet

Drag that to your bookmarks menu, and next time you’re looking at a partial-text feed, click the bookmark to transform the viewed page into the full-text version. Enjoy!

Tags: , , , , , , , ,

Comments (2)

“Threadless New Tees” feed needs fixing

My “Threadless New Tees” scraper feed is currently listing all items as being called ‘height=”146″‘. This is obviously not correct ;)

I’ll fix it ASAP…

Update: fixed!

Tags: , , ,

Comments

Unblocked

I just found an error in an Apache config file for taint.org, resulting in some of the legacy RSS feed URLs producing invalid data — this meant that anyone subscribed to the Feedburner feed, for example, had been missing out on my witterings. Fixed now — apologies!

Tags: , , ,

Comments

Link-blog Networking

Cool — del.icio.us just added a feature whereby you can now see who has you in their network, and, of course, you can further view their networks and see who’s in them.

This’d be great to produce social-network graphs, although I daresay Joshua mightn’t be so keen on the spidering load. ;) I’ve optimistically requested some form of dump, anyway.

The social networking aspect of link collection and link-blogging via del.icio.us is emerging nicely; I’m keen to see what’s next in the pipeline.

A few interesting things:

  • Almost everyone who’s using del.icio.us seriously for link collection — ie. applying some quality control thresholds, and bothering to write one-line descriptions, at least — has filled out their ‘network’ by now.

  • It’d be useful to have “groups”, so that we can now assert things like “jm, boogah, n0wak, negatendo, tweebiscuit, leonardr, muckster and torrez form a group”. I’m sure that’d provide useful info, although could probably be inferred anyway. (People are attempting to hack it by using a shared tag on all their postings, like the “irishblogs” tag, but that’s an awful misuse of tagging in my opinion ;)

  • Also, it’ll be interesting to see what’ll happen once Google Co-op figures out a way to incorporate the del.icio.us network data. To be honest, I’m very surprised it wasn’t already in there — it seems like a no-brainer… maybe some Y!/G corporate rivalry is getting in the way.

Anyway, in the meantime it’s producing lots of good fodder for my SpicyLinks feed.

SpicyLinks is an implementation of something that I mentioned in a comment on this weblog entry, regarding future methods of reading weblogs; in essence, it’s an automated blog aggregation summariser. It reads other people’s link-blogs, so I don’t have to, and reports the stuff that proves popular in my personal collection of sources.
(Credit where due: HotLinks provided much of the inspiration, but doesn’t support personalisation, hence the reimplementation.)

SpicyLinks is similar to Populicious, but that app really misses the point, in my opinion. I don’t particularly want to know what everyone is pointing at; I want to know what a selected set of trusted sources (with good taste!) are pointing at.

This aggregation is pretty similar to the del.icio.us ‘network’ feed, but with much lower volume, and a higher signal/noise ratio, attained by dropping the ‘one-off’ items that only one person is pointing at. Initially, that may seem like a major failure, since you miss the ‘fresh bits’ — but as long as you’ve got the right people in your source network, it actually works very well.

It’d be great if this was one of the features implemented in the del.icio.us ‘network’ system…

Tags: , , , , , , , , , , ,

Comments (4)

Peoplefeeds and Quick Aggregation

peoplefeeds is cool.

I’ve been looking for something to can aggregate my Flickr, Wordpress blog, and del.icio.us feeds into one venue where I can look up items by tag, in a single page-load.

Suprglu was my leading contender, although they weren’t there yet since they didn’t seem to support importing my blog posts with tags preserved — pretty much everything wound up tagged as “uncategorized“. disappointing. :( so I was waiting for them to fix that.

This post by Richard MacManus pointed at another couple of options; 43Things and Peoplefeeds. I hadn’t actually noticed that 43Things was doing this kind of aggregation too; unfortunately as far as I can see, they doesn’t support tag preservation and browsing, so there goes my desired feature. shame.

However, Peoplefeeds was right on target, offering a ‘Unified Tagspace’ and a ‘Search All-Personal-Content’ mechanism. It works nicely, too. Here’s my personal aggregator, combining my Flickr feed, my weblog feed, and my del.icio.us feed into one — and with a unified tag-space; here’s my ‘hiking’ tag, hitting all 3 feeds. Perfect.

One other use for this — I’ve forgotten why I was looking for one of these, but I know I did want one ;) — it can be used to make a “private planet“. If you have 3 or 4 feeds that you need to combine into one, this provides a very easy way to do that; just set up a userid at Peoplefeeds for that purpose.

Tags: , , , , ,

Comments (7)

RSS Feeds for Events in Dublin

So, now that I’m back in Dublin, I’ve taken a quick look around for ways to keep up to date on upcoming live gigs — and found that the situation, frankly, sucks. In particular, almost none of the sites are offering RSS or Atom feeds yet.

Having said that, Waxy and Leonard’s Upcoming.org is doing quite nicely for the Dublin metro area:

And lots of credit for the promoter, MCD, who seem to be just about the only Irish listings site who offer RSS:

This is fantastic, but — naturally — they don’t cover events put on by their competitors. ;)

Apart from that, it’s pretty shoddy. Lots of late-90’s-looking websites out there, and no feeds in sight. Thankfully, Feed43, and some perl scripting, is on hand to allow me to take matters into my own hands.

Entertainment Ireland offer a pretty good music news section — but sans feed. Feed43 saves the day:

And, surprisingly, Ticketmaster, of all sites, is turning out to be a great way to find out what’s on in Dublin, listing pretty much all ticketed events in a nice, clean, succinct format. Unfortunately, the highest location resolution it offers for Ireland is the country as a whole. However, this can be worked around by subscribing to individual venues, such as Crawdaddy or The Village. (This has a happy side-effect of narrowing down the types of music — I can skip finding out that The Eagles are playing, since they won’t be playing at Crawdaddy ;)

For some reason, though, Ticketmaster haven’t got around to offering their own RSS feeds. Not a problem — in response I’ve hacked up tm2rss.cgi, a little script which scrapes the venue pages and produces RSS:

For other venues, simply take the venue URL (for example, http://www.ticketmaster.ie/venue/198641 for The Village), add the numeric venue ID in place of NNNNN in this URL: http://taint.org/scraped/tm2rss.cgi?v=NNNNN , then use that as the Feed URL in your feed reader.

Tags: , , , , , , , , ,

Comments (12)

Feed43 Rocks

I’ve just given Feed43 a go. It’s very nifty.

Basically, it’s a pattern-based HTML-to-RSS scraper — similar to my own Sitescooper in that respect ;) — but built entirely as a web app.

Until now, I’ve been hacking up scrapers one by one, using either Sitescooper or WWW::Mechanize, run from cron, and putting the output up on taint.org; for example, http://taint.org/scraped/ has the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.

Today, I came across a case where I wanted a new RSS feed, and since I’d been hearing of Feed43, thought I’d give it a try, to save running yet another cron on our server. It was reasonably simple, although still required a fair bit of knowledge of the concepts of scraping via pattern matching against HTML; but the UI was fantastic, with everything previewed using a clean AJAX UI, and within 3 minutes I had a new feed.

For the curious — the feed was for TCAL’s Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL - Ireland. (go ahead and sign up if you like ;)

New web pattern, by the way — there’s a trend towards using “secret URLs” instead of username/password authentication for the kind of “trivial” auth task, like editing feed-scraper details. Good idea.

Tags: , , , , ,

Comments (1)

Apple Attempting to Patent RSS Aggregation

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple “IP”.

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here’s one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That’s pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, “inventors”!

It’s important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

Tags: , , , , ,

Comments