[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: IDEA to SERIOUSLY reduce download times!



On Thu, Jul 08, 1999 at 12:57:02AM -0400, Fabien Ninoles was heard to say:
> How do you handle configuration files? You should put them directly in
> shipping. Do you also check for change in permission, etc? That's an
> important part of security updates. [Sorry, I don't have too much time
> for checking your scripts. You have made a great work just by designing
> this. I'm pretty sure my suggestions will be easy to implement if not already
> there :) ]

  Yes, the configuration files are put directly into shipping.  I overlooked
file permissions, thanks for pointing that out :-)  I may have to switch to
a real programming language for this, unless I can find a program to copy
one file's permissions to another.

  I'm also trying to come up with a clever way of detecting when files simply
moved [to avoid including the whole file in the patch].  No luck so far :)

> > 
> >   This is the easy bit :-)  All that's still needed is clean handling of
> > versioning.
> > 
> >   I still cannot find a clean way to actually apply the patches.  Ideally, it
> > would be quite simple: you would execute dpkg --install on the new file.  In
> > the 'unpack' phase of dpkg, dpkg would unpack data.tar.gz as usual, but then do
> > an 'xdelta patch' for all contents of delta.tar.gz, creating backups of
> > originals as with data.tar.gz [I don't actually know what mechanism is used
> > normally for this; are the old files renamed or do the new ones get .dpkg-new
> > appended, or is something else done?].  This way, if something goes horribly
> > wrong in the patching you can complain about an error and restore things to the
> > way they were.  There would have to be a way to indicate patching information
> > elsewhere, of course.  Perhaps a Patches: control item could be added; I don't
> > know what would be done about Packages.gz and apt, or whether distributing
> > patches on the FTP mirrors is a good use of space.
> > 
> >   Of course, that's a pipe dream :-)  I've also considered hackery using
> > preinst scripts to do the patch [and therefore having to include delta.tar.gz
> > somewhere inside data.tar.gz] but this would get nasty -- in order to have
> > dpkg's file list come out correctly, data.tar.gz would have to contain entries
> > for all files that were really in the package.  Another option (probably the
> > best for now) is to use dpkg-repack: create a temporary 'old' package, extract
> > it and the new data.tar.gz to a temporary directory, apply the patches and
> > copy the patched files to where the new data.tar.gz was extracted, recreate
> > data.tar.gz with the patched files, and create the new deb from debian-binary,
> > control.tar.gz, and the rebuilt data.tar.gz .
> 
> That's the easier/safer way I think. For sure, being able to simply apply
> the patches will be better but will make the system harder to repair.
> We should also check which versioning policy we should apply to not bloating
> the archives unnecessary [as if it was not already]. I think is the major
> reason why no bin-diff never make it into debian. Whatever solutions you
> could come out for differing a binaries, you still have to find a way to
> make binary diff pratical. May be we should think about a way to get back
> to the original package. So, the procedure will be similar to yours except
> for a new middle steps:

  Hmm, yes...but the problem there is that if we really want to get back to the
original package we have to include any deleted files verbatim.  This could
either be no problem at all or doubly bloat the patch.  Another possibility is
to generate the reverse patch in the process of applying the patch to the
original deb. (I like this better :-) )

  Anyway, I have a script now which will patch a .deb file with dpkg-repack;
interestingly, it doesn't generate an identical package to the one it should,
because xdelta changes (increases) the compression level of .gz files
(eg, README.gz) when it patches them.  I don't imagine that this is really a
problem, except that there might be problems with programs that compare md5sums
with what's in the archive.  (actually this whole scheme is problematic that
way :) )  I think I could turn xdelta's look-inside-gzfiles option off, but
that would probably give us less efficient patches..

> 1- dpkg-repack the package.
> 2- remove the old diff so we can have a 'orig.deb' package.
> 3- apply the new patch.

  Ahhh, I see.  Clever :-)  We'd want to have a good way to decide when to
make patches and when to do normal packages..your suggestion below sounds
good to me.

> This has some disavantage, although. We should keep the install-diffs
> for all packages install this way around so we can removed it later.
> Also, the maintainer should be able to start over a new diffs if major
> modifications is made. This can be done simply by providing a 'orig.deb'
> package with no diffs. I also suggests to add a control field indicating
> if a patches is provided against which version (ex. Version: 1.2.3-5,
> Patches: 1.2.3-1). This control will be preserved in both the available
> list and the status list. So, when apt decided to download a package,
> he should check:

  Wow, I already added the Patches: field.  Convergence :-) [actually, I
went to some trouble to delete it in the patched package, but I think leaving
it in sounds like a good idea now that I think about it]

> 1) Is the packages provides a patches?
> 2) If yes, do we have the orig.deb in status? 
>    (indicate in the new control field of the available package).
> 3a) Yes, good! download the patches and apply it to orig.deb.
> 3b) No, then do we have a patches to this orig.deb? 
> 4) Yes, good! remove the old patches and apply the new one we just download.
>
> If we keep a simple rule of patches should always be made over original
> packages (the ones with no patche), we can even ask to dinstall to make
> the patches themself when the patches field is present. We should however
> not stick too much with rules. Providing diff against old-stable can
> also be a good thing.

  True..
  It may also be worth considering the case where the patch is larger than
the original (although I hope this would never happen :) )

> Even if the patches break for whatever reason, we can always get back to
> the old methods: full package download. Remember that this step will be
> done in the download phase of dselect/apt. Not when upgrading packages.
> Because the full download is still available.

  [confusion]  I don't think I quite got that.  You mean that the full package
download has to be done in the download phase (as opposed to during a dpkg
run?)

> because lot of packages will stick to orig.deb. Also, we can let dinstall
> decide if it keeps the diffs or removed it (updating the control file
> by the same occasion). The diffs can also be on another server, with their
> own Packages.gz (containing solely the Package:, Version:, and Patches:
> fields). Apt will mix them on update then will choice the best choice to
> the user.

  Aptable patches...like it...

> ------------------------------------------------------------------------
> Fabien Ninoles                                             GULUS founder
> aka Corbeau aka le Veneur Gris               Debian GNU/Linux maintainer
> E-mail:                                                    fab@tzone.org
> WebPage:                      http://www.callisto.si.usherb.ca/~94246757
> RSA PGP KEY [E3723845]: 1C C1 4F A6 EE E5 4D 99  4F 80 2D 2D 1F 85 C1 70
> ------------------------------------------------------------------------

  I'll attach the latest versions of the scripts (the last one had some bugs
that, naturally, didn't turn up until I tried to apply the patches it
generated :-) )  I'm thinking that I may switch in the next iteration to
a more heavy-duty language than bash (probably Python..)

  This version still doesn't do file-permission corrections.

  Daniel

-- 
I haven't lost my mind, I know exactly where I left it.
#!/bin/sh

#  Makes a delta between two Debian packages
#
#  Call as makepatch [fromfile] [tofile]
#
#  Note that this preserves the new DEBIAN/ directory [control.tar.gz]

abort()
{
  rm $TMPDIR -rf
  exit $1
}

if [ "$#" != 2 ]
then
  echo "Error: $0 must be called with two arguments"
fi

TMPDIR=/tmp/makepatch_$$

mkdir $TMPDIR || abort -1

FROMDIR=$TMPDIR/from
TODIR=$TMPDIR/to

# Extract the packages:
# <dir>/data is the original data.tar.gz
# <dir>/delta is where we store the hierarchy of deltas
# <dir>/shipping is what's left of the data.tar.gz :-)
# <dir>/control is for the control files
mkdir $FROMDIR $TODIR &&
mkdir $FROMDIR/{data,control} $TODIR/{data,shipping,control,delta} &&
(cd $FROMDIR && ar x $1) &&
(cd $TODIR   && ar x $2) || abort -1

if [ "${1#*.deb}" != "" ]
then
  echo "Warning!  $1 does not end in .deb, filename guessing will be confused!" 1>&2
fi

if [ "${2#*.deb}" != "" ]
then
  echo "Warning!  $2 does not end in .deb, filename guessing will be confused!" 1>&2
fi

FINALNAME="`basename ${1%.deb}`:`basename ${2%.deb}`.deb-diff"

if [ `cat $FROMDIR/debian-binary` != "2.0" ]
then
  echo "Warning!  $1 is not in Debian-binary-2.0 format.  Bad Things may happen!" 1>&2
fi

if [ `cat $TODIR/debian-binary` != "2.0" ]
then
  echo "Warning!  $2 is not in Debian-binary-2.0 format.  Bad Things[tm] may happen!" 1>&2
fi

echo "Extracting archives..."

(cd $TODIR/control && tar zxf ../control.tar.gz) &&
(cd $TODIR/data && tar zxf ../data.tar.gz) &&
(cd $FROMDIR/data && tar zxf ../data.tar.gz) &&
(cd $FROMDIR/control && tar zxf ../control.tar.gz) || abort -1

CONFFILES=

if [ -f $TODIR/control/conffiles ]
then
  CONFFILES=`cat $TODIR/control/conffiles`
fi

FROMVERSION=`grep ^Version: $FROMDIR/control/control | sed 's/Version: *//'`
echo "Previous version was $FROMVERSION"

echo "Populating directories.."

for dir in `find $TODIR/data -type d -printf '%P\n'`
# Reproduce the directory structure
do
  echo $dir
  mkdir $TODIR/{delta,shipping}/$dir || abort -1
done

echo "Moving conffiles.."

for file in $CONFFILES
# Don't do deltas on the conffiles, let dpkg handle them!
do
  echo $file
  mv $TODIR/data/$file $TODIR/shipping/$file || abort -1
done

echo "Calculating deltas.."

for newfile in `find $TODIR/data -type f -printf '%P\n'`
do
  if [ -f $FROMDIR/data/$newfile ] && (xdelta delta $FROMDIR/data/$newfile $TODIR/data/$newfile $TODIR/delta/$newfile ; [ `find $FROMDIR/data/$newfile -printf %s` -gt `find $TODIR/delta/$newfile -printf %s` ] )
  then
    echo "Delta created for $newfile"
    rm -f $TODIR/data/$newfile || abort -1
  else
    echo "No delta created for $newfile"
    rm -f $TODIR/delta/$newfile &&
    mv $TODIR/data/$newfile $TODIR/shipping/$newfile || abort -1
  fi
done

# Create the patched archive

echo "Creating archive:"
echo "stamp.."
echo "2.0.patch-0.1" > $TODIR/debian-binary
echo "data..."
(cd $TODIR/shipping && tar czf ../data.tar.gz *) || abort -1
echo "delta..."
(cd $TODIR/delta && tar czf ../delta.tar.gz *) || abort -1
echo "control..."
if grep ^Installed-Size: $TODIR/control/control > /dev/null 2>&1
then
  INSERT=Installed-Size
else
  INSERT=Maintainer
fi

sed "s/^$INSERT:/Patches: $FROMVERSION\\
$INSERT:/" < $TODIR/control/control > $TMPDIR/control &&
cat $TMPDIR/control &&
mv $TMPDIR/control $TODIR/control &&
(cd $TODIR/control && tar czf ../control.tar.gz *) || abort -1

(cd $TODIR && ar r "/tmp/$FINALNAME" debian-binary control.tar.gz data.tar.gz delta.tar.gz) || abort -1

echo "Done, patchfile is in /tmp/$FINALNAME"

rm $TMPDIR -rf
# Clean up
#!/bin/sh

# applies a patch to a Debian file on the system; the patch must have been
# generated by makepatch.

abort() {
  rm -rf $TMPDIR
  exit $1
}

if [ "$#" != 1 ]
then
  echo "$0: must be called with one argument (the patchfile to apply)"
  exit -1
fi

if ! [ -e "$1" ]
then
  echo "$0: $1 does not exist"
  exit -1
fi

TMPDIR=/tmp/applydebpatch_$$

mkdir $TMPDIR || abort -1

OLDDIR=$TMPDIR/old
NEWDIR=$TMPDIR/new

mkdir $OLDDIR $NEWDIR || abort -1

ln -s $1 $NEWDIR && cd $NEWDIR && ar x `basename $1` && rm `basename $1` || abort -1
# Extract the patch archive
if [ `cat $NEWDIR/debian-binary` != "2.0.patch-0.1" ]
then
  echo "I don't know how to use this type of file to patch packages!" 1>&2
  abort -1
fi

echo "Analyzing patch.."

mkdir control data delta &&
(cd control && tar zxf ../control.tar.gz) &&
(cd data && tar zxf ../data.tar.gz) &&
(cd delta && tar zxf ../delta.tar.gz) || abort -1

PACKAGE=`grep ^Package: control/control | sed 's/^Package: *//'`
PATCHVER=`grep ^Patches: control/control | sed 's/^Patches: *//'`
VERSION=`grep ^Version: control/control | sed 's/^Version: *//'`

echo "Rebuilding old package $PACKAGE [should be version $PATCHVER].."
cd $OLDDIR &&
dpkg-repack $PACKAGE &&
ar x *.deb &&
rm *.deb || abort -1

mkdir control data &&
(cd control && tar zxf ../control.tar.gz) &&
(cd data && tar zxf ../data.tar.gz) || abort -1

OLDVER=`grep ^Version: control/control | sed 's/^Version: *//'`

if [ "$PATCHVER" != "$OLDVER" ]
then
  echo "Error!  This patch was made to update version $PATCHVER of $PACKAGE, you"
  echo "have version $OLDVER installed.  Bailing out.."
  abort -1
fi

cd $NEWDIR &&
sed '/^Patches:/d' < control/control > ctl &&
mv ctl control/control || abort -1

for file in `find $NEWDIR/delta -type f -printf '%P\n'`
do
  xdelta patch $NEWDIR/delta/$file $OLDDIR/data/$file $NEWDIR/data/$file || abort -1
  # Create a patched version
done

# recreate the package
outfile=${PACKAGE}_${VERSION}.deb

mkdir $NEWDIR/tmp &&
mv $NEWDIR/control $NEWDIR/tmp/DEBIAN &&
mv $NEWDIR/data/* $NEWDIR/tmp &&
dpkg-deb -b $NEWDIR/tmp /tmp/$outfile || abort -1

echo "Done!  Output is /tmp/$outfile"

rm -rf $TMPDIR

Reply to: