Neocon search terms

I’m back from a week in Cornwall. I’d like to say I was rested, but chasing after an 11-month-old baby in a caravan isn’t all that restful. Still, it was sunny, and good for a change of pace ;)

Via b1ff.org, here’s the Nexis search that US Department of Justice White House liaisons ran on job candidates to determine their political leanings:

[first name of a candidate] and pre/2 [last name of a candidate] w/7 bush or gore or republican! or democrat! or charg! or accus! or criticiz! or blam! or defend! or iran contra or clinton or spotted owl or florida recount or sex! or controvers! or racis! or fraud! or investigat! or bankrupt! or layoff! or downsiz! or PNTR or NAFTA or outsourc! or indict! or enron or kerry or iraq or wmd! or arrest! or intox! or fired or sex! or racis! or intox! or slur! or arrest! or fired or controvers! or abortion! or gay! or homosexual! or gun! or firearm!

This Nexis reference says the “w/n” keyword searches for ‘words .. within 5 or 10 words of each other, Ex: “Enron w/5 investigation”‘.

This is just a smidgen away from the concept of a SpamAssassin-style scoring filter. Crazy stuff.

Best of all, it’s buggy and over-sensitive, according to one librarian: ‘If that is really their search string, they were going through 99% unrelated citations. There need to be a very nested set of parentheses to make the terms work, starting with one after the w/7. Fired and sex are OR’ed twice and need to be nested, at least in the case of Fired and the OR’d terms immeadiately following.’

Update: good Slashdot comment thread here. This comment indicates that the above librarian might be off-base regarding the w/7 parentheses, since the OR operator has higher priority. Here is an even better walkthrough of the query statement logic. Finally, here’s an explanation of the “spotted owl” curiosity…

Tags: , , , , , ,

Comments

Google Webmaster Tools now includes ‘goog-love.pl’

Back in 2006, I wrote a script I called “goog-love.pl”; it used Google’s now-dead SOAP search API (thanks, Nelson!) to figure out which Google queries your web site was “winning” on. Unfortunately, Google shut down new signups for the SOAP interface later that year.

I was just looking through Google’s Webmaster Tools page for taint.org, when I came across the Statistics / Top search queries page:

img

This is exactly what goog-love.pl produced. hooray!

Tags: , , , , ,

Comments

Single-Letter Google Hits

Here’s what happens when you search for single letters on Google:

Interestingly I got to see the new Google search results page, with the sidebar, once. It must be in the process of rolling out…

Tags: , ,

Comments (8)

Another script: goog-love.pl

A quick hack –

goog-love.pl – find out where your site’s google juice comes from

This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)

Tags: , , , , , ,

Comments (5)