Links for 2008-09-16

Tags: , , , , , , , , , , , , , , , , , , , ,

Comments

Links for 2008-07-22

ZSFA — I Want The Mutt Of Feed Readers Zed recommends Newsbeuter. must take a look

We Want A Dead Simple Web Tablet For $200. Help Us Build It. having worked on a project to do just this, believe me, this is doomed. DOOMED

Science Clouds ‘compute cycles in the cloud for scientific communities .. allows you to provision customized compute nodes .. that you have full control over using a leasing model based on the Amazon’s EC2 service.’ Wonder if they’d like to give SA some time ;)

Tags: , , , , , , , , , , , , , , , , , , , , , ,

Comments (1)

Full-text RSS bookmarklet

This site offers a nifty utility for dealing with those annoying sites which offer only partial text content in their RSS and Atom feeds.

Given an RSS or Atom feed’s URL, the CGI will iterate through the posts in the feed, scrape the full text of each post from its HTML page, and re-generate a new RSS feed containing the full text.

The one thing it’s missing is a one-click bookmarklet version. So here it is:

Full-text RSS Bookmarklet

Drag that to your bookmarks menu, and next time you’re looking at a partial-text feed, click the bookmark to transform the viewed page into the full-text version. Enjoy!

Tags: , , , , , , , ,

Comments (3)

“Threadless New Tees” feed needs fixing

My “Threadless New Tees” scraper feed is currently listing all items as being called ‘height=”146″‘. This is obviously not correct ;)

I’ll fix it ASAP…

Update: fixed!

Tags: , , ,

Comments

SpicyLinks and del.icio.us Network Summarization

Ross Mayfield:

Every time I see Gabe Rivera of TechMeme, I ask for the same thing — MeMeme. Give me TechMeme where the core index is based on who I read, about 150 people at any given time, to show me what my friends are interested in.

Funnily eough, that is exactly why I wrote SpicyLinks!

It works pretty well — in fact, nowadays I don’t really bother reading slashdot, Digg, Reddit, et al, particularly frequently, because I know that all the really interesting stuff will be at the top of my newsreader in the SpicyLinks feed.

Anyway, I’ve been calling SpicyLinks a ’summarizing aggregator’, but the discussion that arose from Ross’ posting inspired me. A little bit of hacking has come up with an interesting twist: take a del.icio.us social network, a CGI script called deliciousnetwork2opml.cgi, and 15 minutes hacking on SpicyLinks to support inclusion of OPML via a remote URI, and hey presto — it’s now a social-network summarising aggregator. ;)

Tags: , , , , , , ,

Comments (6)

Unblocked

I just found an error in an Apache config file for taint.org, resulting in some of the legacy RSS feed URLs producing invalid data — this meant that anyone subscribed to the Feedburner feed, for example, had been missing out on my witterings. Fixed now — apologies!

Tags: , , ,

Comments

Link-blog Networking

Cool — del.icio.us just added a feature whereby you can now see who has you in their network, and, of course, you can further view their networks and see who’s in them.

This’d be great to produce social-network graphs, although I daresay Joshua mightn’t be so keen on the spidering load. ;) I’ve optimistically requested some form of dump, anyway.

The social networking aspect of link collection and link-blogging via del.icio.us is emerging nicely; I’m keen to see what’s next in the pipeline.

A few interesting things:

  • Almost everyone who’s using del.icio.us seriously for link collection — ie. applying some quality control thresholds, and bothering to write one-line descriptions, at least — has filled out their ‘network’ by now.

  • It’d be useful to have “groups”, so that we can now assert things like “jm, boogah, n0wak, negatendo, tweebiscuit, leonardr, muckster and torrez form a group”. I’m sure that’d provide useful info, although could probably be inferred anyway. (People are attempting to hack it by using a shared tag on all their postings, like the “irishblogs” tag, but that’s an awful misuse of tagging in my opinion ;)

  • Also, it’ll be interesting to see what’ll happen once Google Co-op figures out a way to incorporate the del.icio.us network data. To be honest, I’m very surprised it wasn’t already in there — it seems like a no-brainer… maybe some Y!/G corporate rivalry is getting in the way.

Anyway, in the meantime it’s producing lots of good fodder for my SpicyLinks feed.

SpicyLinks is an implementation of something that I mentioned in a comment on this weblog entry, regarding future methods of reading weblogs; in essence, it’s an automated blog aggregation summariser. It reads other people’s link-blogs, so I don’t have to, and reports the stuff that proves popular in my personal collection of sources.
(Credit where due: HotLinks provided much of the inspiration, but doesn’t support personalisation, hence the reimplementation.)

SpicyLinks is similar to Populicious, but that app really misses the point, in my opinion. I don’t particularly want to know what everyone is pointing at; I want to know what a selected set of trusted sources (with good taste!) are pointing at.

This aggregation is pretty similar to the del.icio.us ‘network’ feed, but with much lower volume, and a higher signal/noise ratio, attained by dropping the ‘one-off’ items that only one person is pointing at. Initially, that may seem like a major failure, since you miss the ‘fresh bits’ — but as long as you’ve got the right people in your source network, it actually works very well.

It’d be great if this was one of the features implemented in the del.icio.us ‘network’ system…

Tags: , , , , , , , , , , ,

Comments (4)

Peoplefeeds and Quick Aggregation

peoplefeeds is cool.

I’ve been looking for something to can aggregate my Flickr, Wordpress blog, and del.icio.us feeds into one venue where I can look up items by tag, in a single page-load.

Suprglu was my leading contender, although they weren’t there yet since they didn’t seem to support importing my blog posts with tags preserved — pretty much everything wound up tagged as “uncategorized“. disappointing. :( so I was waiting for them to fix that.

This post by Richard MacManus pointed at another couple of options; 43Things and Peoplefeeds. I hadn’t actually noticed that 43Things was doing this kind of aggregation too; unfortunately as far as I can see, they doesn’t support tag preservation and browsing, so there goes my desired feature. shame.

However, Peoplefeeds was right on target, offering a ‘Unified Tagspace’ and a ‘Search All-Personal-Content’ mechanism. It works nicely, too. Here’s my personal aggregator, combining my Flickr feed, my weblog feed, and my del.icio.us feed into one — and with a unified tag-space; here’s my ‘hiking’ tag, hitting all 3 feeds. Perfect.

One other use for this — I’ve forgotten why I was looking for one of these, but I know I did want one ;) — it can be used to make a “private planet“. If you have 3 or 4 feeds that you need to combine into one, this provides a very easy way to do that; just set up a userid at Peoplefeeds for that purpose.

Tags: , , , , ,

Comments (7)

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

Tags: , , , , , , , ,

Comments (7)

RSS Feeds for Events in Dublin

So, now that I’m back in Dublin, I’ve taken a quick look around for ways to keep up to date on upcoming live gigs — and found that the situation, frankly, sucks. In particular, almost none of the sites are offering RSS or Atom feeds yet.

Having said that, Waxy and Leonard’s Upcoming.org is doing quite nicely for the Dublin metro area:

And lots of credit for the promoter, MCD, who seem to be just about the only Irish listings site who offer RSS:

This is fantastic, but — naturally — they don’t cover events put on by their competitors. ;)

Apart from that, it’s pretty shoddy. Lots of late-90’s-looking websites out there, and no feeds in sight. Thankfully, Feed43, and some perl scripting, is on hand to allow me to take matters into my own hands.

Entertainment Ireland offer a pretty good music news section — but sans feed. Feed43 saves the day:

And, surprisingly, Ticketmaster, of all sites, is turning out to be a great way to find out what’s on in Dublin, listing pretty much all ticketed events in a nice, clean, succinct format. Unfortunately, the highest location resolution it offers for Ireland is the country as a whole. However, this can be worked around by subscribing to individual venues, such as Crawdaddy or The Village. (This has a happy side-effect of narrowing down the types of music — I can skip finding out that The Eagles are playing, since they won’t be playing at Crawdaddy ;)

For some reason, though, Ticketmaster haven’t got around to offering their own RSS feeds. Not a problem — in response I’ve hacked up tm2rss.cgi, a little script which scrapes the venue pages and produces RSS:

For other venues, simply take the venue URL (for example, http://www.ticketmaster.ie/venue/198641 for The Village), add the numeric venue ID in place of NNNNN in this URL: http://taint.org/scraped/tm2rss.cgi?v=NNNNN , then use that as the Feed URL in your feed reader.

Tags: , , , , , , , , ,

Comments (12)

Feed43 Rocks

I’ve just given Feed43 a go. It’s very nifty.

Basically, it’s a pattern-based HTML-to-RSS scraper — similar to my own Sitescooper in that respect ;) — but built entirely as a web app.

Until now, I’ve been hacking up scrapers one by one, using either Sitescooper or WWW::Mechanize, run from cron, and putting the output up on taint.org; for example, http://taint.org/scraped/ has the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.

Today, I came across a case where I wanted a new RSS feed, and since I’d been hearing of Feed43, thought I’d give it a try, to save running yet another cron on our server. It was reasonably simple, although still required a fair bit of knowledge of the concepts of scraping via pattern matching against HTML; but the UI was fantastic, with everything previewed using a clean AJAX UI, and within 3 minutes I had a new feed.

For the curious — the feed was for TCAL’s Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL - Ireland. (go ahead and sign up if you like ;)

New web pattern, by the way — there’s a trend towards using “secret URLs” instead of username/password authentication for the kind of “trivial” auth task, like editing feed-scraper details. Good idea.

Tags: , , , , ,

Comments (1)

Apple Attempting to Patent RSS Aggregation

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple “IP”.

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here’s one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That’s pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, “inventors”!

It’s important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

Tags: , , , , ,

Comments

Planet Antispam at abuse.net

Planet Antispam now has a better URL — http://planet.spam.abuse.net/ . Much better!

Tags: , , , , , ,

Comments (1)

Threadless RSS

Clothing: I love Threadless. Unfortunately, they don’t have an RSS feed for new T-shirts. So I wrote a quick scraper:

with pictures, naturally. This is not going to help my Threadless habit. ;)

Here’s a preview of what the feed looks like:

Tags: , , , , , , , ,

Comments (2)

Selves and Others now publishing RSS feeds

News: Selves and Others is a site that cropped up a couple of months ago, tracking the output of many of the left’s strongest voices, for example:

Well, one feature they were missing was RSS feeds, allowing users to track new articles by a specific author as they’re published. They’ve just added it; the good old orange XML button now appears on each author’s page. Excellent!

Tags: , , , , , , , , , ,

Comments

MS Patents sudo(8)

Patents: The varchars.com scraped RSS feeds now include new patent grants and applications by certain companies! Interesting, although given that most developers are advised not to look, not advisable ;)

However, I glanced at the MS one — and immediately spotted this gem: US Patent 6,775,781, filed by Microsoft, is a patent on the concept of ‘a process configured to run under an administrative privilege level’ which, based on authorization information ‘in a data store’, may perform actions at administrative privilege on behalf of a ‘user process’.

This, and the patent claims, perfectly describe the operation of sudo, fundamentally as it’s operated since running on a 4.1BSD VAX-11/750 in 1980.

20 years head start on a patent application — surely that must qualify as prior art ;)

Tags: , , , , , , , , , ,

Comments (2)

Irish Dating Site, and TheyWorkForYou.com

Web: Bernie Goldbach points to a site that’s news to me: AnotherFriend.com. It’s an Irish dating site.

I’ve had the odd discussion comparing dating culture in the US (organised ‘dating’) and Ireland and the UK (where it’s a lot more casual), and I must say, I was really convinced that the Friendster/craigslist-style organised, web-mediated dating just wouldn’t fly.

Seems I was wrong! Right now, there’s 157 people online on the site, with a good half of those being logged-in, chatting users, and about 75% of those in turn being premium, paying members. Wow, not bad.

Politics: TheyWorkForYou.com is a triumph. The most incredibly detailed, and web-aware, hypertextual database of political activity I’ve seen yet. The web-awareness — full of scraping, links, RSS and even community — is what makes it amazing; the concept of being able to read news of your representative’s latest speeches and voting record in your RSS aggregator is incredible. We need to get this out there for every country in the world.

It certainly beats Today in Parliament, that’s for sure ;)

Aside: nice choice of username for the ‘Site News’ weblog:

Some sites linking to this entry

An error occurred: Connection error: Access denied for user: ‘fawkesmt’@'localhost’ (Using password: YES)

Wierd: Incredible footage (WMV stream) of a guy who went nuts, converted a caterpillar earthmover into what is essentially a tank, and went on a GTA-style rampage through the streets of Granby, 15 miles west of Denver, Colorado. In the process, he destroys the local bank, the newspaper, and several stores, seemingly working on the basis of (several) personal grudges.

Tags: , , , , , , , , , ,

Comments

FOAF and social networking sites

Networking: FOAF is really building steam now.

In the meantime, Tribe.net plans to announce RSS feeds and Jabber support this Friday.

It’s good to see some open-standards based stuff being used to compete. Given this, I think we might see more useful possibilities emerging as these sites become true web services.

Tags: , , , , , , , , , ,

Comments

Clay Shirky on Complex Software Systems

Software: Shirky on the Semantic Web. Great snippet:

it turns out that people can share data without having to share a worldview, so we got the meta-data without needing the ontology. Exhibit A in this regard is the weblog world. In a recent paper discussing the Semantic Web and weblogs, Matt Rothenberg details the invention and rapid spread of ‘RSS autodiscovery’, where an existing HTML tag was pressed into service as a way of automatically pointing to a weblog’s syndication feed.

About this process, which went from suggestion to implementation in mere days, Rothenberg says:

Granted, RSS autodiscovery was a relatively simplistic technical standard compared to the types of standards required for the environment of pervasive meta-data stipulated by the semantic web, but its adoption demonstrates an environment in which new technical standards for publishing can go from prototype to widespread utility extremely quickly. …

This, of course, is the standard Hail Mary play for anyone whose

technology is caught on the wrong side of complexity. People pushing such technologies often make the ‘gateway drug’ claim that rapid adoption of simple technologies is a precursor to later adoption of much more complex ones. Lotus claimed that simple internet email would eventually leave people clamoring for the more sophisticated features of CC:Mail (RIP), PointCast (also RIP) tried to label email a ‘push’ technology so they would look like a next-generation tool rather than a dead-end, and so on.
Here Rothenberg follows the script to a tee, labeling RSS autodiscovery
’simplistic’ without entertaining the idea that simplicity may be a requirement of rapid and broad diffusion. The real lesson of RSS autodiscovery is that developers can create valuable meta-data without needing any of the trappings of the Semantic Web. Were the whole effort to be shelved tomorrow, successes like RSS autodiscovery would not be affected in the slightest.

Another good line: ‘There is a list of technologies that are actually political philosophy masquerading as code, a list that includes Xanadu, Freenet, and now the Semantic Web.’

Tags: , , , , , , , , , ,

Comments

Urban Design and Vogon Poetry

via Boing Boing, Stating the bleeding obvious: if you drive instead of walk, you get fat. Well, duh!

But the alternative is, if you walk or cycle instead of drive, you’ll get killed. ‘American pedestrians are roughly three times more likely to be killed by a passing car than are German pedestrians - and more than six times more likely than Dutch pedestrians. For bicyclists, Americans are twice as likely to be killed as Germans and more than three times as likely as Dutch cyclists.’

However, Irvine has some of the best cycling infrastructure (and weather) I’ve ever seen — except nobody uses it, apart from the weekender recreational cyclists.

Can’t figure out why — I guess it’s just a cultural thing; everyone drives, and people cycling or walking near some cars seems to give the drivers heart attacks. (Seriously. The other night, a driver honked and slowed to a crawl after spotting myself and Catherine walking along — on the sidewalk, 10 feet from the roadway. And not making any sudden movements, either.)

As Kasia said, s/Connecticut//:

You can do all sorts of weird things in Connecticut suburbs, from walking your cat on a leash to painting tiger stripes on your car — but strap a camera to your back and take out the two wheeler for a spin and you’re the weirdest thing since the Keebler elves.

The EU Software Patent protest makes Indymedia. interesting intersection!

But I think they could have looked into the translation issues a bit more; ’software patents kill efficient software development’ isn’t exactly urgent enough ;) Also — is the idea of the software patents song and mime a sort of ’stop patents through Vogon poetry‘ thing?

Baghdad Burning scraped RSS, via Sitescooper RSS feeds.

Tags: , , , , , , , , ,

Comments

XULChannels.com

This is very cool. It’s a fully browser-based RSS aggregator, no installation required; it just runs in your Mozilla or Firebird browser window. Nifty.

Found via a referrer link on Jeremy’s blog — there are no secrets where public referrer data is involved ;)

Tags: , , , , , , , , ,

Comments

Sitescooper and RSS

I did this a while ago, but I’ve been very busy in work and haven’t had time to mention it. But it’s worth doing some preliminary pointing at Sitescooper RSS.

Basically, I’ve added RSS output to Sitescooper, the venerable HTML-scraping script that can disassemble a news/blog/reading-material website efficiently, use a cache, log in, cope with redirects, figure out when stuff is new and when it’s old, perform diffs, confuse you with copious regular expressions, etc. etc.

Sitescooper was originally oriented entirely towards display on a Palm; then new PDAs came out that could do good text or HTML display, so they’re now supported too; and now, I’m no longer commuting and using an RSS aggregator instead for that kind of daily reading, so RSS is the natural next step.

Basically, what this means is that those annoying blogs that don’t include the full text in the item block, or those websites you like that don’t have an RSS feed — make a site file, and scrape them into your aggregator yourself!

This code is present in the current Sitescooper CVS version; the only doco is really what’s in that RSS directory on sitescooper.org.

If your interest is piqued, take a look…

Tags: , , , , , , , , ,

Comments

When tidying goes bad

Rod pointed out that my RSS feed was borked. oops, WebMake and HTML::Parser had “tidied” it. Who knew that RDF was case-sensitive? Not I.

Ah well… now fixed.

Tags: , , , , ,

Comments

titles at last

I’ve added titles to this blog, since RSS looks silly without them. But I am not going back through all those entries… argh…

Tags: , ,

Comments

Scraping into RSS

Peerfear: a scraping sitefilter servlet for scraping sites into RSS.

Tags: , , ,

Comments

RSS by mail

Aaron shares his rss-by-mail script. My reaction (cut from mail): “Together with my Mailman-archives-to-RSS script, and my blog (which is updated by mail), soon the semantic web will run entirely on SMTP…” (cackles evilly).

Well, maybe not yet — but it’s getting there. a bit.

Tags: , , , , , , , ,

Comments