[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#128818: [patch] packages.gz diff support for apt



On Wed, Nov 24, 2004 at 04:49:34AM +1000, Anthony Towns wrote:
> Michael wrote:
> > The code will download until it finds a empty patch, it assumes then
> > that the index is now up-to-date and stops. If it does not find a
> > patch it will auto-fallback to Packages.bz2 and then to
> > Packages.gz. The code is diffed against the arch repository at:
> > http://people.debian.org/~mdz/arch/apt@packages.debian.org
> > (apt@packages.debian.org/apt--main--0)
> 
> FWIW, what I was considering last I looked at this (Dec 2003 
> apparently...) was a combination of an index file and gzipped --ed 
> diffs. The index file gives you a bit more control over your patches, 
> and some redundancy so you can check if you've gotten everything screwed 
> up; --ed style diffs happen to kick ass for this problem.

Thanks for your answer. I'm happy about your comments. As I wrote in
the original mail, most of the patch is based on the ideas in your
blog. 

It should be easy enough to modify the code to generate/apply --ed
style diffs. 

I croned a simple script on http://people.debian.org/~mvo/pdiffs to
see if the code is stable in the real-world (still using normal diffs,
no --ed style). ed-style diffs should halve the size of the diffs
again :) 
 
> So the index file I was imagining looked like:
[..]

While all the information is certainly usefull, I wonder if it's all
needed. A problem I see that the index-file still needs to download
a bunch of patches. I wonder if the idea of Jeroen van Wolffelaar to
use only one ed-style diff is workable. It would indeed have a much
better performance for the client. 

Below I outline my thoughts on the index file. I would very much
appreciate your comments. My current feeling is that we may go without
a explicit index-file. But I may be wrong here of course.

> The History section tells you what the original file you're patching 
> from was, and the Patches section lets you validate the patch you're 
> about to apply. Knowing the md5sum/size of the original file is 
> obviously crucial, since that's how you know what patch to apply. 

The current approach calculates the md5sum of the local Packages
file. Then it checks if there is a patch on the server that matches
this md5sum. It's just one attempt to download a file. If the file is
not found, it will fallback for the Packages.gz file anyway. 

> Knowing the md5sum/size of what you're going to end up with is a useful 
> sanity check, so that you can stop halfway through if you've somehow 
> managed to get yourself into a loop or similar. 

If the patch fails for some reason the next calculated md5sum will not
match any file on the server and the code will fallback to download
the Packages.gz file. If patch itself fails, apt will notice and
fallback to downloading the Packages.gz file.

> Knowing the md5sum of the patches is useful just in case diff has a
> root exploit. 

I'm not sure if I understand this correctly. You think that someone
could sneak in a rogue diff to expolit apt?

> Knowing the size of the patches you need to download is good for
> progress bars.

http/ftp will tell us about that and it should already work with the
current patch.

> Knowing the date of the resulting Packages file you're going to
> create at each step is useful for debugging -- while you might
> expect daily patches for testing/unstable, they'll come at much more
> irregular intervals for stable or security updates.

That's indeed usefull. 

thanks,
 Michael

-- 
The first rule of holes is: when you find yourself in one, stop digging. - PJ
Linux is not The Answer. Yes is the answer. Linux is The Question. - Neo



Reply to: