Another script: goog-love.pl
A quick hack –
goog-love.pl – find out where your site’s google juice comes from
This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.
The output is in plain text and a chunk of HTML.
usage:
goog-love.pl sitehost google-api-key < access.log > out.html
e.g.
cat /var/www/logs/taint.org.* | goog-love.pl \
taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html
NOTE: this script requires the SOAP::Lite module be installed. Install
it using apt-get install libsoap-lite-perl or cpan SOAP::Lite.
It also requires a Google API key.
For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)
- #1 for kriskat225: http://taint.org/2006/01/20/220239a.html
- #1 for kriskat224: http://taint.org/
- #1 for mailman rss: http://taint.org/mmrss/index.html
- #1 for ray is naked: http://taint.org/2005/05/27/195421a.html
- #1 for beardy justin: http://taint.org/2005/09/10/002323a.html
- #1 for threadless rss: http://taint.org/2005/05/25/060857a.html
- #1 for louis fitzgerald: http://taint.org/2005/05/12/020118a.html
- #1 for download JusteTune: http://taint.org/index.php?tag=apple
- #1 for mobile repair delhi: http://taint.org/2005/11/11/032651a.html
- #1 for site:taint.org mythtv: http://taint.org/index.php?tag=hdtv
- #1 for “Google Map” IDS rulesets: http://taint.org/2005/09/
- #1 for spam email “prank a friend”: http://taint.org/2004/11/
- #1 for site:taint.org mythtv freevo: http://taint.org/index.php?tag=mythtv
- #1 for world map desktop background: http://taint.org/xplanet/
- #1 for kate thornton + Samuel L jackson: http://taint.org/2003/12/10/185721a.html
- #1 for when did chris horn leave iona technologies?: http://taint.org/2003/05/
- #2 for natkat224: http://taint.org/
- #2 for itms linux: http://taint.org/2005/09/20/022107a.html
- #2 for msn IDs hacking software: http://taint.org/index.php?tag=hacking
- #3 for gmail spam filter: http://taint.org/2004/04/15/033025a.html
- #3 for live world map on desktop: http://taint.org/xplanet/
- #4 for moin mozex: http://taint.org/2004/10/08/081409a.html
- #4 for editable p45: http://taint.org/2005/01/27/025238a.html
- #4 for urban dead exploits: http://taint.org/index.php?tag=games
- #4 for gmail spam filtering: http://taint.org/2004/04/15/033025a.html
- #4 for world map desktop wallpaper: http://taint.org/xplanet/
- #5 for cdwow.ie: http://taint.org/2003/12/04/185038a.html
- #5 for life hacking: http://taint.org/2005/10/17/210751a.html
- #5 for Adelphi Charter: http://taint.org/index.php?tag=politics
- #6 for irish SME: http://taint.org/2005/06/23/212513a.html
- #6 for urbandead: http://taint.org/index.php?tag=hacks
- #6 for SKY NEWS IRELAND: http://taint.org/2004/05/12/205717a.html
- #7 for daniel cuthbert: http://taint.org/2005/10/12/205836a.html
- #7 for SAMUEL L. JACKSON QUOTES: http://taint.org/2003/12/10/185721a.html
- #7 for cool background pictures: http://taint.org/xplanet/
- #8 for CDWOW: http://taint.org/2003/12/04/185038a.html
- #8 for urban dead: http://taint.org/2005/10/29/224403a.html
- #8 for korea porn: http://taint.org/2003/07/12/031422a.html
- #8 for BBC port 8998: http://taint.org/2003/08/
- #8 for iftop documentation wrt: http://taint.org/index.php?tag=freevo
- #8 for php mail injection spam: http://taint.org/2005/12/08/202248a.html
- #8 for fake open source software : http://taint.org/index.php?tag=open-source
- #9 for faad symbian: http://taint.org/index.php?tag=apple
- #9 for sky news ireland: http://taint.org/2004/05/12/205717a.html
- #9 for telemarketing counter speech: http://taint.org/2002/11/12/130851a.html
- #10 for “Scratch Heads Over”: http://taint.org/2003/07/12/031422a.html
- #10 for web scraper linux console: http://taint.org/2004/06/05/023726a.html
Download here (5 KiB perl script).
Notes:
if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.
Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)
Tags: goog-love.pl, google, hacks, perl, scripts, searching, software

Yoav Shapira said,
March 2, 2006 @ 3:03 pm
Useful, thank you.
Justin said,
March 2, 2006 @ 4:03 pm
more useful now that I’ve uploaded it ;)
Sebastian Bergmann said,
March 2, 2006 @ 7:29 pm
Justin,
I don’t speak Perl, so I don’t know if
useprefix has been deprecated. if you wish to turn off or on the use of a default namespace, then please use either ns(uri) or defaultns(uri) at /usr/lib/perl5/vendor_perl/5.8.8/SOAP/Lite.pm line 858, <> line 31029.
is a problem within your code or with my system’s Perl installaion.
Justin said,
March 3, 2006 @ 10:30 am
Sebastian –
It looks like this is a bug in the latest version of SOAP::Lite (0.67); if you google for [useprefix SOAP::Lite defaultns] there’s quite a few reports including:
http://rt.cpan.org/Public/Bug/Display.html?id=16780 http://rt.cpan.org/Public/Bug/Display.html?id=16898
I would suggest maybe downgrading to an earlier version, if you can — perhaps using the distribution copy via “apt-get” instead of loading it via CPAN.
disappointing! sorry about that! Damn external dependencies — live by the CPAN, die by the CPAN ;)
Yoav Shapira said,
July 13, 2006 @ 2:53 pm
SOAP::Lite v0.68 seems to work fine out of the box.