[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Xdelta (Was: About the GPL)



Steve Dunham <dunham@cse.mse.edu> writes:

> Rob Browning <rlb@cs.utexas.edu> writes:
> 
> > gsstark@mit.edu (Gregory S. Stark) writes:
> > 
> > > Actually Xdelta is cool it deals with this problem, and it has
> > > native support for .deb files. One of these days I'll go play with
> > > it and figure out how to set up our archives so that people can
> > > download xdelta patches instead of complete packages. This will
> > > really help people track those large packages that keep changing
> > > packaging details (hi Branden:).
> 
> > I doubt that xdelta will actually give you any space savings with .deb
> > files.  .deb files are compressed, so even a minor change to one file
> > in the package will be likely to make the package incomparable to the
> > previous version on a binary level.
> 
> "Native support for .deb files" probably means that it knows a .deb
> file is an ar archive of two .tar.gz files.  (It can successfully diff
> RPM files, because it finds the cpio file and decompresses it before
> doing the diff.)
> 
> Nonetheless, the last time I checked out xdelta, it took a lot of
> memory to take a diff of big files (where big is circa 4MB, and a lot
> of memory is over 20MB), which would make taking diffs of Branden's
> packages rather difficult.
> 
> 
> A different approach, which wouldn't require the user have the
> original package would be to develop "patch packages" which contain:
> 
>  * A list of files to delete.
>  * A .tar.gz of new files.
>  * A .tar.gz of xdeltas of changed files
>  * A normal control.tar.gz file

What Steve says is true, Xdelta detects an AR archive and then
uncompresses its contents, if compressed.  This was a special case
hack for the .deb format.

Actually, there is a completely re-written version of the Xdelta code
which is already completed.  The first version which appears as the
0.x series was originally an experiment and contains both a number of
problems which evolved out of my original experiments and a lack of
concern for memory use.  The new version removes all the vestigual
code and has optimized the resulting algorithm.  Further, it stream
processes its files page by page.  So, the ammount of memory needed is
now drastically lower than before.  For a pair of files (not
compressed) where the FROM file is N bytes, the absolute minimum
memory required is approximately N/4 (for an in-memory index), not
counting any page caching done for the FROM file.  The FROM file, for
obvious reasons, requires seeking, while the TO file does not, so only
the FROM file requires any page caching.  Formerly, all files were
read into memory.  The old version mmaps files but cannot do that for
compressed files, so computing deltas between compressed files is
worse than uncompressed files.  In the new version, the TO file can be
compressed on the fly, but the FROM file cannot, so to compute deltas
for a compressed FROM file still requires uncompressing into memory
for seeking purposes.

I have not publically released the new code because I am still
tweaking the interface for the upcoming client/server version of PRCS.
I do not want to get myself into a backward-compatibility hole like I
did the last time.  Also, the new version does not deal with
compression or file formats the way the old library does, that will
have to be dealt with outside the library.  I will release the new
code in library form sometime in the next few months when I decide it
is finished, and then am hoping to get out of the business of
maintaining end-user applications--hopefully the library can be used
by the standard diff and patch utilities (the issue of compressed and
archived file expansion will then need to be dealt with).

-josh


Reply to: