January 15, 2002 - Justin Mason's Weblog

Wow! Lossy zip compression reduces all files down to 10% or even 0% of their original size! The FAQ:

It utilizes a two-pass bit-sieve to first remove all unimportant data from the data set. Lzip implements this quiet effectively by eliminating all of the 0’s. It then sorts the remaining bits into increasing order, and begins searching for patterns. The number of passes in this search is set to (10-N) in lzip, where N is the numeric command-line argument we’ve been telling you about.

For every pattern of length (10/N) found in the data set, the algorithm makes a mark in its hash table. By keeping the hash table small, we can reduce memory overhead. Lzip uses a two-entry hash table. Then data in this table is then plotted in three dimensions, and a discrete cosine transform transforms it into frequency and amplitude data. This data is filtered for sounds that are beyond the range of the human ear, and the result is transformed back (via an indiscrete cosine) into the hash table, in random order.

Take each pattern in the original data set, XOR it with the log of it’s entry in the new hash table, then shuffle each byte two positions to the left and you’re done!

And you can see, there is some very advanced thinking going on here. It is no wonder this algorithm took so long to develop!

Very impressive! ;) (fwded by Joe on the ILUG list)

Comments closed

Archives

(Untitled)