[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [RFC] An rsync friendly gzip



On Fri, Nov 19, 1999 at 03:23:30AM +1100, Martijn van Oosterhout wrote:
> Hi,
> 
> I've heard about this idea for a while now and I've talked
> to Andrew Tridgell about it and so finally I got around to
> writing a very simple version to do some (simple) tests on it.

Great!  I also thought about this a while back, but never got around
to finishing it.  I found the spot in gzip/zlib that needs
modification, and I think it should do much better than a 3%
increase.  Furthermore, the gzip format is flexible enough that you
can produce compressed files that can be uncompressed with the
original gzip, at a cost of 5 bytes per restart.

For modifying gzip or zlib, the key functions are ct_tally and
_tr_tally, respectively.  For gzip (version 1.2.4), look at trees.c,
lines 958-1006; specifically, replace the code

    /* Try to guess if it is profitable to stop the current block here */
    if (level > 2 && (last_lit & 0xfff) == 0) {
	...
    }
    return (last_lit == LIT_BUFSIZE-1 || last_dist == DIST_BUFSIZE);
    /* We avoid equality with LIT_BUFSIZE because of wraparound at 64K
     * on 16 bit machines and because stored blocks are restricted to
     * 64K-1 bytes.
     */

with an appropriate version of your own.  The analogous code is at line
1044 of trees.c in zlib-1.1.3.

You also have to change the flushes so that they end up byte-aligned.
The way to do this within the gzip file format is to output a 0-length 
literal block; unfortunately, this wastes 5 bytes.  It's a trivial (and
otherwise useful) extension to the file format to avoid wasting the 5
bytes (add a byte-aligned block type), but I think it's best to avoid that.

Alternatively, you could use the published interface to zlib, telling
it to flush appropriately.

What block size were you using in your tests?

There was a thread a few weeks ago on deity where I talked about this
with Jason Gunthrope.  He was sceptical that there would be good
savings for real .debs; have you tried it?  If not, I will.  I've been 
collecting the .debs I download for the past few weeks.

Best,
	Dylan Thurston


Reply to: