Links for 2008-09-05

Tags: , , , , , , , , , , , , , , , ,

Comments

Links for 2008-07-21

O2 Leaking Customer Photos (updated) the JBoss/Tomcat install leaks the “secret” URLs through it’s default status page. this is the 3rd helping of FAIL for O2’s web team; 2 previous occasions in the last year exposed customer data through “secret” URL manipulation

Avant Window Navigator “a ‘dock-like’ (cough) navigator bar for the Linux desktop” (via Danny, again!)

trickle ‘user-space bandwidth shaper’, ie. like nice(1) for network bandwidth (via Danny)

RFC 5218 - What Makes For a Successful Protocol? ‘Based on case studies, this document identifies some of the factors influencing success and failure of protocol designs.’ (via spicylinks)

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Comments

VCS and the 1993 internet

Joey Hess suggests that current discussions about the superfluity of DVCS systems have a parallel in how the internet protocol world, circa 1993, played out:

I’m reminded of 1993. Using the internet at that time involved using a mishmash of stuff — Telnet, FTP, Gopher, strange things called Archie and Veronica. Or maybe this CERN “web” thing that Tim Berners-Lee had just invented a few years before, but that mostly was useful to particle physicists.

Then in 1994 a few more people put up web sites, then more and more, and suddenly there was an inflection point. Suddenly we were all browsing the web and all that other stuff seemed much more specialised and marginalised.

I would disagree, a little. Back in the early ’90’s, I was a sysadmin playing around with internet- and intranet-facing TCP/IP services (although in those days, the term “intranet” hadn’t been coined yet), so I gained a fair bit of experience at the coal-face in this regard. The mish-mash of protocols – telnet, gopher, Archie, WAIS, FTP, NNTP, and so on — all had their own worlds and their own views of the ‘net. What changed this in 1993 was not so much the arrival of HTTP, but TimBL’s other creation: the URL.

The URL allowed all those balkanized protocols to be supported by one WWW client, and allowed a HTML document to “link” to any other protocol –

The WWW browsers can access many existing data systems via existing protocols (FTP, NNTP) or via HTTP and a gateway. In this way, the critical mass of data is quickly exceeded, and the increasing use of the system by readers and information suppliers encourage each other.

This was a great “embrace and extend” manoeuvre by TimBL, in my opinion — by embracing the existing base of TCP/IP protocols, the WWW client became the ideal user interface to all of them. Once NCSA Mosaic came along, there really was no alternative to rival the Web’s ease of use. This was the case even if you didn’t have a HTTP server of your own; you could still access HTML documents and remote URLs.

In essence, HTML and the URL were the trojan horse, paving the way for HTTP (as HTML’s native distribution protocol) to succeed. It wasn’t the web sites that helped the WWW “win”, but embrace-and-extend via the URL.

For what it’s worth, I think there is an interesting parallel in today’s DCVS world: git-svn.

Tags: , , , , , , , ,

Comments (2)

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

Tags: , , , , , , , ,

Comments (7)

IBM patents web transcoding proxies

Web: I link-blogged this, but it’s generated some email already, so it deserves a proper posting.

One thing you quickly learn about IBM where software patents are concerned, is that if IBM Research is making noise about a new software technique, they’ve probably patented it already. A few years ago, IBM was keen on HTTP transcoding — rewriting web content in a proxy, to be more suitable for display and access from less-capable devices, like PDAs and mobile phones.

So I probably should not have been surprised today when I came across USPTO patent 6,886,013, which is an IBM patent on a ‘HTTP caching proxy to filter and control display of data in a web browser’. It was applied for on Sep 11 1997, and finally granted on Apr 26 of this year.

The first claim covers:

  1. A method of controlling presentation on a client of a Web document formatted according to a markup language and supported on a server, the client including a browser and connectable to the server via a computer network, the method comprising the steps of:

    as the Web document is received on the client, parsing the Web document to identify formatting information;

    altering the formatting information to modify at least one display characteristic of the Web document; and

    passing the Web document to the browser for display.

Notice that there’s actually no mention of a HTTP proxy there — in other words, an in-browser rewriting element, such as Greasemonkey or Trixie may be covered by that claim. However, the claim does indicate that the document is passed from the ‘client’ to the ‘browser’, so perhaps having the ‘client’ inside the ‘browser’ evades that.

It appears this really wasn’t original research even when the patent was applied for — there’s probable prior art, even if the patent itself doesn’t cite it. For example, WWW4 in 1995 included Application-Specific Proxy Servers as HTTP Stream Transducers, which discusses ‘transduction’ of the HTTP traffic and gives an example of ‘A “rewriting” OreO (transducer element) that encapsulates each anchor inside the Netscape Blink extension, making anchors easier to spot on monochrome displays’. On top of that, Craig Hughes notes that his ’senior project at Stanford in 1992 was an implementation of a content-modifying HTTP proxy. It re-worked HTML in http streams to add some markup to enable full navigability through touch screen or voice control, for screen-only kiosks.’

Add this to the ever-growing list of over-broad software patents.

Tags: , , , , , , , , ,

Comments

Greasemonkey: transcoding extension for Firefox

Web: Now this is very cool stuff: ‘Greasemonkey is a Firefox extension which lets you to add bits of DHTML (”user scripts”) to any webpage to change it’s behavior.’

In other words, you can rewrite any page viewed in Firefox, as it transits between the server and your client’s display; a form of transcoding.

Traditionally, transcoding is performed using a HTTP proxy which applies the transformation, or a specialised HTTP user agent which transcodes and outputs a whole new set of documents with the results.

That was all a little hacky for full-scale integration into your web browser, though, so Greasemonkey is a big improvement for that use-case.

Some good links:

And some demos:

Remember, these are single, sub-100-line JS scripts, running entirely locally in the user’s web browser. The last one gives you an idea of what coolness is possible…

My contribution: an ad-removal script for Metafilter. It took some 30 seconds of hacking to produce this — soooo easy. It’s a whole new world of site customisation and hackable filtering. You thought AdBlock was good, this is ever niftier ;)

Tags: , , , , , , , , ,

Comments

Microsoft 0wnz ‘http’

Web: Back in 2002, it occurred to someone to check the Google search results for ‘http’, to figure out what the most popular sites were.

Looks like it’s changed — here’s the top five results from a Google search for ‘http’ now:

  • 1: Microsoft
  • 2: AltaVista (!!)
  • 3: Yahoo!
  • 4: My Excite
  • 5: Google

My guess: older links are getting good PageRank, using whatever new tweaked algorithm they’re using. But AltaVista beating Google? ;)

Tags: , , , , , , , , , ,

Comments

Easy-peasy web scraping: HTTP::Recorder

Perl: I’ve been writing a few convenience web-scrapers recently using WWW::Mechanize, with great success.

So the latest development, HTTP::Recorder, looks very nifty too:

HTTP::Recorder is a browser-independent recorder that records interactions with web sites and produces scripts for automated playback. Recorder produces WWW::Mechanize scripts by default (see WWW::Mechanize by Andy Lester), but provides functionality to use your own custom logger.

… Simply speaking, HTTP::Recorder removes a great deal of the tedium from writing scripts for web automation. If you’re like me, you’d rather spend your time writing code that’s interesting and challenging, rather than digging through HTML files, looking for the names of forms an fields, so that you can write your automation scripts. HTTP::Recorder records what you do as you do it, so that you can focus on the things you care about.

No SSL support yet, though, as far as I can see, but for simple scraping – or as a good starting point for a more complex Mechanize script — it looks like it’ll work great.

Tags: , , , , , , , , , ,

Comments

Slurpie

Web: Slurpie - (another) distributed peer-to-peer downloading protocol (via HtP).

This looks pretty interesting; no special server is required, Slurpie can be used to download files from a HTTP/FTP server in a ’swarming’ fashion similar to BitTorrent.

However, Slurpie does require a central server of its own, which it needs to ‘know about’ somehow in advance, and that server will then know who’s downloading what. Not sure how you’d do that effectively; in this case, a .torrent-type file format that contains the ‘main’ file URL and a URL for the Slurpie server, might be more effective.

Tags: , , , , , , , , , ,

Comments

Annoying Non-spam Tricks, pt. XVIII

Spam: OK, I just noticed that I have a few hits for the SpamAssassin rule HTTP_ENTITIES_HOST in my corpus. This searches for obfuscated hostnames in the URL links in mail messages, and is generally a very reliable sign of spam — because who would want to hide a hostname apart from spammers?

Well, Buy4Now.IE, for one, it seems. WTF? I have a mail here that uses this markup:

  <a href="''http://www&#46;buy4now&#46;ie/fbd''>

Totally and utterly nuts. If they really wanted a way to tickle malware detectors, mail filters, and anti-spam measures, they could hardly pick a better one. I have no idea why they did this.

grr….

Tags: , , , , , , , , ,

Comments

Belkin’s Brain-damage, and Bye-bye Public Domain

Spam: The Reg reports that a Belkin Router software upgrade hijacks HTTP connections to spam the browser with ads. Here’s a screenshot of the ad page. Here’s a USENET post bemoaning the situation, and the followup from a Belkin PM.

This is amazing; a working piece of network infrastructure has been effectively modified to:

  • replace the expected HTTP responses with spam ‘for your convenience’
  • do this once every 8 hours until told to stop
  • report serial numbers, IP addresses and software revisions back ‘home’ as part of this

And, of course, web browsing is not the only thing that runs over port 80.

So, it’s a router that inserts spam into your packets, whether you want it or not, due to a software upgrade; and if you want the bugfixes in that upgrade, you get the spam whether you want it or not. And, that spam could break quite a bit of legitimate port 80 traffic, such as automated download tools that aren’t a full web browser, for example. And the spam is unannounced on the download page, or in the change log. I’d hope that’s pretty serious under consumer-protection law… it certainly should be.

Copyright: In case there was any doubt that Sonny Bono and Jack Valenti wanted to remove the legal concept of the public domain, check this quote from the Congressional record:

(Mary Bono): Actually, Sonny wanted the term of copyright protection to last forever. I am informed by staff that such a change would violate the Constitution. I invite all of you to work with me to strengthen our copyright laws in all of the ways available to us. As you know, there is also Jack Valenti’s proposal for term to last forever less one day. Perhaps the Committee may look at that next Congress.

Wow. More via an Eldred-related site.

Tags: , , , , , , , , ,

Comments

open proxy referrer spam again

Googlebot using open proxies? Somehow, I doubt it. An interesting snippet from the access logs again. (Some details rewritten to avoid boosting PageRank.)

220.73.165.14 - - [25/Jul/2003:04:42:14 +0100] “GET /someurl/foo HTTP/1.0″ 2147483647 0 “http://www dot gay-sex-men dot net/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”
220.73.165.14 - - [25/Jul/2003:09:04:17 +0100] “GET /someurl/foo HTTP/1.0″ 2147483647 0 “http://www dot gay-sex-men dot net/” “Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)”
220.73.165.14 - - [25/Jul/2003:09:15:28 +0100] “GET /someurl/foo HTTP/1.0″ 2147483647 0 “http://www dot baitbus dot ws/” “Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)”
220.73.165.14 - - [25/Jul/2003:09:18:11 +0100] “GET /robots.txt HTTP/1.0″ 200 130 “-” “GoogleBot”
220.73.165.14 - - [25/Jul/2003:09:27:57 +0100] “GET /someurl/foo HTTP/1.0″ 2147483647 0 “http://www dot blowjobs-cumshots dot net/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)”
220.73.165.14 - - [25/Jul/2003:13:18:04 +0100] “GET /someurl/foo HTTP/1.0″ 2147483647 0 “http://www dot hot-legs dot info/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)”

Tags: , , , , , , , , ,

Comments

SOAP and firewalls

Taking a look at the referrers, I came across Mark O’Neill’s weblog, which lists taint.org on the blogroll; Mark’s the CTO of Vordel. They have a product called VordelSecure, which seems to be a SOAP firewall proxy, in the same way the Wonderwall product I wrote for Iona was a proxy for CORBA:

When a firewall examines a SOAP request received over HTTP, it might conclude that this is valid HTTP traffic and let it pass. Firewalls tend to be all-or-nothing when it comes to SOAP. A SOAP-level firewall should be capable of:
  1. Identifying if the incoming SOAP request is targeted at a Web service which is intended to be available

  2. Identifying if the content of the SOAP message is valid. This is analogous to what happens at the Network Layer, where IP packet contents are examined. However, at the Application Layer it requires data that the Web service expects.

Cool!

I hear Wonderwall is still around, but rewritten from the ground up. Sorry about that to whoever had to rewrite it ;)

Tags: , , , , , , , , ,

Comments

wierd referrers

308 referrer hits from www.xxxstoryarchive.com, 282 from amateur-porn.us, 282 from nude-lesbians.us, etc. Somehow I doubt it. All the hits are 404s, looking for e.g.

nn.nn.nn.nn - - [12/Jan/2003:18:52:13 +0000] GET /pics54754-96 HTTP/1.1 404 284 http://www.celebrity-nude-pics.com/ “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)”

Hits from hosts at AT&T WorldNet Services and an SBC PPPoX pool. They’re all MSIE 6 on Windows, and it’s been going on for a month or so.

Theory: sounds like MSIE’s download-to-’view’-offline functionality has bugs; when it hits a 404, maybe it requeues that request but then sends it to entirely the wrong IP.

Alternative theory: it’s a pathetically underpowered DDoS. ouch!

Anyone else seen this?

Tags: , , , , , , , , ,

Comments

ICAP

ICAP-server, an (imaginatively-named) daemon which implements ICAP. This seems to be a transcoding proxy server; in other words, it will convert HTML content on the fly, while you browse.

ICAP itself seems to be a protocol for rewriting HTTP responses; in other words, it allows a proxy server to include a small snippet of ICAP client code, and call out to an ICAP server to do the rewriting. Nifty.

Sounds like this could be very handy for low-bandwidth situations; use ICAP to “downshift” web pages into low-bandwidth versions. For example, banner ads can be trimmed out, heavy images converted to small, low-quality JPEGs, etc. One to watch (or help out with).

Ericsson used to have a commercial product which did something similar, but I can’t find it now…

Tags: , , , , , , , , ,

Comments

The top 100 PageRanked CGI scripts

similar to the much-discussed-elsewhere http search trick, which figures out the top 100 websites according to PageRank, here’s the top 100 CGI scripts according to PageRank. They’re incomplete, since only scripts with “cgi-bin” in the URL will show up, but hey ho. The top ten:

And the winner is:

boo.

Tags: , , , , , , , , ,

Comments