Cache Control Using Redirects

This is a handy technique used on the nearly-live planetary desktop backgrounds site; I've been asked for a brief write-up on how it works, so here goes.

Essentially, the problem is that you have a file or set of files with a known validity time period. (For example, in the site above, this is images of the world's current cloud patterns overlaid on a world map, where the images are regenerated every 3 hours.) So you want to ensure that downloaders download an up-to-date file, but at the same time you're shipping a lot of data, and want to take advantage of caching.

The Technique

The technique is to put in place a redirector URL, which redirects to a new URL once every period. So, in the example above, http://taint.org/xplanet/day_clouds_800x600.png is the redirector. This is set up to redirect, using a temporary HTTP redirect code, to something like http://taint.org.nyud.net:8090/xplanet/tmp/200503141756.432933/day_clouds_800x600.png, which is the cacheable target file. (This URL will be invalid, so there's no need to try clicking it.)

Note that the target URL has other features:

In addition, since the redirector URL is on your server, and since a downloader must download that URL to get the current target file's URL, you'll get a usable hit-count from that.

Cache Expiration Control

In addition, files in the target URL's directories use explicit cache-control headers, thanks to an Apache .htaccess file and Apache's 'mod_expires', using these htaccess commands:

    ExpiresActive On
    ExpiresByType image/png "modification plus 1 day"

Note that ExpiresByType has a very flexible syntax to specify validity periods. In this case, it uses an expiry time longer than the required 3 hours, just in case a cache's clock is off by several hours or has faulty timestamp handling.

The Code

If you're planning to implement a similar scheme, here's the shell script that generates this:

#!/bin/sh

cd $HOME/shared/xplanet
PATH=$PATH:/usr/local/bin:$HOME/bin

. config.sh

mkdir output > /dev/null 2>&1

(
cd state

[.... generation of output into "../output" omitted ....]

date
cookiedir=`date -u +%Y%m%d%H%M`.`../gen_rand_999999`
outputdir=$PLAIN_PATH_BASE/tmp/$cookiedir

mv $PLAIN_PATH_BASE/tmp $PLAIN_PATH_BASE/tmp.OLD
mkdir -p $outputdir

# ensure that the tmp dir is unlistable
touch $PLAIN_PATH_BASE/tmp/index.html

# these are what gets requested (and cached)
cp -p ../output/* $outputdir/.

files=`ls $outputdir`

# generate .htaccess
(
  echo '
    ExpiresActive On
    ExpiresByType image/png "modification plus 1 day"
  '
  for f in $files ; do
    echo "Redirect temp /xplanet/$f ${CACHED_URLS_BASE}tmp/$cookiedir/$f"
  done
) > $PLAIN_PATH_BASE/.htaccess

# and these are never actually accessed, but make for good wget targets
for f in $files ; do
  touch $PLAIN_PATH_BASE/$f
done

rm -rf $PLAIN_PATH_BASE/tmp.OLD

date

) > LOG

and the config.sh file it sources contains:

  PLAIN_PATH_BASE=$HOME/taint.org/xplanet/
  PLAIN_URLS_BASE=http://taint.org/xplanet/
  CACHED_URLS_BASE=http://taint.org.nyud.net:8090/xplanet/

gen_rand_999999 is a short perl script to generate a random number between 0 and 999999 inclusive:

#!/usr/bin/perl
srand (time^$$^ unpack "%L*", `ps axww | gzip`);
print int rand(999999);

Given perl's weak PRNG, it's important to do this properly.

(The details of how the image generation takes place are omitted here, since that's not what's important for the purposes of this page.)

CacheControlUsingRedirects (last edited 2006-02-19 22:38:05 by 83-70-68-161)