[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Archive Restructuring - Package Pool



Hello again world,

(I guess I work on Jovian hours or something. My apologies)

I'll break this proposal into a couple of moderately disjoint
sections, primarily for ease of exposition but hopefully also for ease
of comprehension.

The first thing I'd like to ask you to consider, then, is converting
the archive to a package pool and symlink tree structure.



The Package Pool
================

0. Introduction
---------------

A new directory should be created in the /pub/debian hierarchy:
package-pool.

This directory should contain the physical copies of (not symlinks to)
all the packages currently distributed by the Debian group.

1. Structure of the Package Pool
--------------------------------

The directory hierarchy should be as follows:

	/pub/debian/package-pool
		main/
		contrib/
		non-free/
		non-us/			-- where applicable
		disks/

[please see below for further explanation of the "where applicable"
rider]

Within each of package-pool/{main,contrib,non-free,non-us} should be a
structure of the following form:

	  /pub/debian/package-pool/main/
		source/
			admin/
			base/
			...
		binary-all/
			admin/
			base/
			...
		binary-alpha/
			admin/
			base/
			...
		binary-i386/
			...
		...

(that is a "source" directory, and "binary-" directories for each
architecture under development; each containing the usual single-level
classification of packages)

Some examples would probably be illustrative at this point.

1.1 Example: cruft
------------------

If we are currently making available version 0.9.6 of the package
cruft for i386, and version 0.9.6.1 for alpha and m68k, we would have
the following files available:

	/pub/debian/package-pool/main/
		source/
			admin/
				cruft_0.9.6.tar.gz
				cruft_0.9.6.dsc
				cruft_0.9.6.1.tar.gz
				cruft_0.9.6.1.dsc
		binary-alpha/
			admin/
				cruft_0.9.6.1.deb
		binary-i386/
			admin/
				cruft_0.9.6.deb
		binary-m68k/
			admin/
				cruft_0.9.6.1.deb

That is, we include the three compiled packages in their appropriate
subdirectories, and include the sources for all available versions in
the sources directory.

1.2 Example: distributed-net-pproxy
-----------------------------------

The distributed.net personal proxy has two different versions
currently available -- the one that made it into hamm, and the current
one in slink. The former is available for the i386 and alpha
architectures, the latter for i386 and sparc.

	/pub/debian/package-pool/non-free/
		source/
			misc/
				distributed-net-pproxy_277b.orig.tar.gz
				distributed-net-pproxy_277b-1.dsc
				distributed-net-pproxy_277b-1.diff.gz
				distributed-net-pproxy_280-1.orig.tar.gz
				distributed-net-pproxy_280-1.dsc
				distributed-net-pproxy_280-1.diff.gz
		binary-alpha/
			misc/
				distributed-net-pproxy_277b-1.deb
		binary-i386/
			misc/
				distributed-net-pproxy_277b-1.deb
				distributed-net-pproxy_280-1.deb
		binary-sparc/
			misc/
				distributed-net-pproxy_280-1.deb

1.3 Example: the foobar package [fictitious]
-------------------------------

Finally, we consider a slightly trickier example. The foobar package
was released to bo at version 1.1-1, and rereleased to hamm at version
1.1-2 (a simple recompile to libc6). A couple of weeks later, however,
a horrible bug is found, and an update has to be done for both the bo
and hamm releases. 

I'll leave two options open here, one that seems mildly ugly but
should work with minimum effort, and the other which I think is the
Right Thing.

Option 1: Release foobar_1.1-3 (libc5) for bo, and foobar_1.1-4
(libc6) for hamm.

	/pub/debian/package-pool/non-free/
		source/
			misc/
				foobar_1.1.orig.tar.gz
				foobar_1.1-3.dsc
				foobar_1.1-3.diff.gz
				foobar_1.1-4.dsc
				foobar_1.1-4.diff.gz
		binary-i386/
			misc/
				foobar_1.1-3.deb
				foobar_1.1-4.deb

Option 2: Release foobar_1.1-3 for both hamm and bo, compiling two
separate packages, foobar_1.1-3_bo.deb (libc5 based) and
foobar_1.1-3.deb (current, libc6 based).

	/pub/debian/package-pool/non-free/
		source/
			misc/
				foobar_1.1.orig.tar.gz
				foobar_1.1-3.dsc
				foobar_1.1-3.diff.gz
		binary-i386/
			misc/
				foobar_1.1-3_bo.deb
				foobar_1.1-3.deb

This has the advantage that it doesn't require the maintainer to think
about old versions in advance -- s?he can just compile for hamm,
upload, and belatedly think "Ooo! A bo upload would've been good
too", and switch to a bo machine, recompile and reupload. No mucking
about with version numbers required. Further, this does not need to be
thought about by the maintainer at all -- someone following the stable
release can simply download the sources, and upload a bo-only version.


2. The disks/ directory
-----------------------

In order to make the bootdisk images more conveniently located, they
should be moved out of main/ and into their own directory.

	/pub/debian/package-pool/disks/
		disks-alpha/
		disks-i386/
		disks-m68k/

The recommended layout is as follows:

	/pub/debian/package-pool/disks/disks-i386/
		2.0.10_1998-07-17/
			[images]
		2.0.10_1998-07-21/
			[images]

This mirrors the current layout of /pub/debian/dists/main/disks-* as
it currently stands, minus the "current" link.



3. Changes to the dists/ hierarchy
----------------------------------

Having thus found a nice place to dump all out packages, we can thus
alter the current policy of having the distributions contain either a
copy of the package itself or a symlink to a copy of the package in a
previous release (*breath*), to just including a symlink to the
package in the package-pool.

Continuing the first of the three examples above, we would thus have:

	/pub/debian/dists/slink/
	     main/
	        binary-alpha/
	            admin/
	                cruft_0.9.6.1.deb -> ../../../../../package-pool/
				main/binary-alpha/admin/cruft_0.9.6.1.deb
	        binary-i386/
	            admin/
	                cruft_0.9.6.deb -> ../../../../../package-pool/
				main/binary-i386/admin/cruft_0.9.6.deb
	        binary-m68k/
	            admin/
	                cruft_0.9.6.1.deb -> ../../../../../package-pool/
				main/binary-m68k/admin/cruft_0.9.6.1.deb
	        source/
	            admin/
	                cruft_0.9.6.tar.gz -> ../../../../../package-pool/
	                        main/source/admin/cruft_0.9.6.tar.gz
	                cruft_0.9.6.dsc -> ../../../../../package-pool/
	                        main/source/admin/cruft_0.9.6.dsc
	                cruft_0.9.6.1.tar.gz -> ../../../../../package-pool/
	                        main/source/admin/cruft_0.9.6.1.tar.gz
	                cruft_0.9.6.1.dsc -> ../../../../../package-pool/
	                        main/source/admin/cruft_0.9.6.1.dsc
	       
Bootdisks are stored as a symlink to the appropriate subdirectory of
the package-pool.

	/pub/debian/dists/slink/
	    disks/
	        disks-i386 -> ../../../package-pool/disks/disks-i386/
	                            2.0.10_1998-07-21
 

4. Procedures
-------------

There are a number of operations that can be performed on the
package-pool:


Developer Uploads:

	     * Make a new version of a package available
	         -- Upload foo_xy.zz.orig.tar.gz, 
			   foo_xy.zz-y.diff.gz,
			   foo_xy.zz-y.dsc, 
		    and    foo_xy.zz-y_arch.deb

             * Make a new port of a package available		 
	         -- Upload foo_xy.zz-y_arch.deb

    (possibly:
	     * Make a port to an old release available
	         -- Upload foo_xy.zz-y_arch_release.deb
    )

    
Old Version Control:

(eg, when 2.0-1 is available, so 1.3-7 goes bye-bye)

	     * Make a particular port unavailable 
	     * Make a particular Debian revision unavailable 
	     * Make a particular upstream revision unavailable

Some sample use scenarios are:

	     * A new version of an i386 package is uploaded, making
	       the old one irrelevent: remove the i386 port, but leave
	       the source for the old version so the alpha and powerpc
	       ports can be rebuilt.

	     * A new version of an m68k package is uploaded, making
               the old one irrelevent: remove the m68k port, and note
               that all the other architectures have also moved on, so
               go a bit further and remove the Debian source, and the
               upstream source too.

	     * Someone notes that a particular release of the upstream
               source of a package is released under a license with
               the clause "Debian GNU/Linux may not distribute any
               part of this work." We get rid of the upstream source,
               any .diff.gz's we may have made, and the .deb's we've
               compiled.

[determining which versions of which packages need to be kept
available will be dealt with in a later message]


5. Authority
------------

The package pool should be controlled by the ftp maintainer, and the
people processing the Incoming queue.

In particular, processing the Incoming queue should become as
automated as possible [nb: I do not know enough about this particular
job to say whether or not any further automation is possible. I'm
happy to jump on the bandwagon and say that completely automated
updates would be nice, however. More on this later].

Given this, the difficulties associated with release management become
a matter of updating symlinks in the dists/ hierarchy.


6. Miscellaenous Considerations
-------------------------------

6.1 Coping with non-us
----------------------

We would like to integrate non-us into the distribution as well as we
possibly can -- Debian is primarily about a free GNU/Linux distribution,
not a free GNU/Linux distribution that follows US laws.

This gives us two conflicting desires -- to make non-us packages just
another part of the distribution for non-us users; and to make non-us
packages completely separate for US mirrors and users.

The above proposal aims at a compromise targetted more at the non-us
users than is currently implemented, and thus requires one of:

	* a non-us master that includes the entire archive as listed
	  above, and a US master mirror that mirrors everything but
	  non-us, and is the favoured machine from which US based mirrors
	  should mirror.

or	* a US master, that includes the entire archive except for non-us,
	  and a non-us master that mirrors master's copy of the main
	  distribution, and allows and installs uploads to non-us.

I would tend toward the former, since it makes it convenient to
synchronise the release of each different category (main, contrib,
non-us and non-free).

6.2 Unreleased architectures
----------------------------

Packages in the current "sid" distribution would be made a part of the
Package Pool, and would simply be symlinked from the dists/ hierarchy.
Removing an architecture from release would thus involve deleting (or
moving) a symlink tree which is relatively light on mirrors. [More on
this later]

6.3 The Package Pool and GNU/HURD
---------------------------------

We do have a possible problem, however, with a single package-pool like
this. In particular, what do we do with HURD packages?

If we can build all our HURD packages from the same source as our GNU/Linux
packages, we're fine -- the HURD release is no different to the alpha or
i386 releases.

If, however, we decide we would prefer to keep the sources separate (to
reflect the different filesystem layout, different standard base system
or something else) then the HURD does not fit into the above layout --
at a minimum there would need to be two source directories.

There are two possible options here: one is to say "To heck with it, we're
talking about Debian GNU/Linux, Debian GNU/HURD should be *completely*
separate, in /pub/debian-hurd, say."

The other is to say "Well, look, we're probably going to support other
kernels sometime on our way to World Conquest! anyway, so why don't we
just accept it and integrate them in much the same way as we've already
done the various ports?". We could then change the package-pool layout
into something resembling:

	package-pool/
		main/
			linux/
				source/
				binary-alpha/
				binary-i386/
				binary-m68k/
			hurd/
				source/
				binary-i386/
				binary-sparc/
			freebsd/
				...

6.4 License Changes
-------------------

Something else that needs to be taken into consideration is what to
do when a package changes classification: ncftp gets rereleased under
GPL; Qt gets GPLed and KDE moves into main; the US export laws are
changed and GPG can go into main. (in decreasing order of liklihood,
I suppose)

As presented, this requires manual intervention by both the archive and
release managers -- packages in the package-pool need to be moved around,
and the symlinks from the dists/ hierarchy need to be changed to point to
the right place (contrib/.../kdebase -> .../contrib/.../kdebase becomes
main/.../kdebase -> .../main/.../kdebase)

This is significantly painful.

Any resolution requires, at a minimum, removing the categorisation of
the package-pool, and having a single tree for main, contrib, non-free,
and non-us, with packages from the different categorisations sitting
side by side, in some sort of liberated harmony.

We could still ensure that only the appropriate packages were put in
the appropriate areas with moderately complicated scripting (if you're
a US mirror, download this list of files, the download all the files in
that list *and nothing else*!!), which could be made to work acceptably
(especially with a US and a non-us master mirror.


6.5 Past Revisions
------------------

In order to make life a little easier when release critical bugs are
found, it would be nice to be able to revert back to older source. This
is difficult at the moment, as old revisions are deleted immediately
upon installing a new revision.

With the package-pool, however, we can conveniently keep as many old
revisions as we may wish.

The question remains which old revisions to keep, and which to delete.
The suggested mechanism for this is a form of LRU algorithm, whereby
packages that have been stable for sometime are kept for longer than
packages that were replaced a couple of days after their first upload.

Further this algorithm should be based on the amount of disk space in use,
so that we may say "The package pool takes up xGB. And it's guaranteed to
stay that way for the next year or so, even as we add more architectures
and packages".

6.6 Load on mirrors
-------------------

The proposal contains no considerations as to how the changeover should be
managed to minimise load on the mirrors. Doing funky stuff with symlinks
remains a possibility. Simply making public announcements that Debian
is about to undergo a major archive rearrangement and that mirrors may
wish to watch their bandwidth would also be possible, and would allow
us to make a clean start.

Taking the longer term viewpoint, however, we should ask ourselves what
sort of load having a package-pool is likely to put on mirrors. It already
has a number of benefits: packages never have to be moved, even if they
stay exactly the same years after their first release has been rm-rf'ed,
the package-pool directory can be mounted on another disk (even a WORM
disk) quite happily where there is more space, making the remainder of
/pub/debian significantly smaller. (modulo CD images)

This does, however, leave those who only wish to do a partial mirror
somewhat out in the cold. Since it seems likely that the package-pool
will contain additional old revisions (of both source code, and known
stable versions of packages) this may become increasingly important. The
only real possibilities seem to be, however, ignoring symlinks completely
(and thus getting duplicated packages between releases), or doing some
complicated scripting on another mirror, and dereferencing some symlinks,
and retargeting others. Neither seems particularly optimal, but it at
least seems that a reasonable solution should be possible.

7. Benefits
-----------

So the important question is what does all this buy us?

First, it provides a clear separation between release management and
archive management -- deleting packages and uploading packages is a
simple matter of sticking them in the package pool; releasing packages or
removing them from release, or putting them in a different release, is a
simple matter of fiddling with a symlink. Further, changes to the release
(eg, a release expiring and all the files under dists/foo being deleted)
do not affect the availability of packages in other distributions.

Second, it gives us somewhere to store past revisions of packages. This
is useful if you're trying to find a working version of a package in
the lag time between a bug's discovery and its fix. It's also useful
around release time, when a package may need to be replaced by an
older version if a quick fix cannot be made in time.

Finally, it allows us to set an upper limit on the size of the Debian
archive, and be able to keep to that limit for some time.

8. Acknowledgements
-------------------

This proposal was initially raised by Bdale Garbee <bdale@gag.com>
a couple of months ago on this list. Almost all the ideas presented
above are his.

This proposal has been refined by comments from: (at least some of :)

        Bill Mitchell <debian@pny-fmail.webquest.com>
        Bdale Garbee <bdale@gag.com>
	Brian White <bcwhite@debian.org>
        Craig Sanders <cas@taz.net.au>
	Dale Scheetz <dwarf@polaris.net>
        David Engel <david@ods.com>
	Guy Maor <maor@ece.utexas.edu>
        Ian Jackson <ian@chiark.greenend.org.uk>
        Klee Dienes <klee@alum.mit.edu>
        Manoj Srivastava <srivasta@datasync.com>
	Philip Hands <phil@hands.com>
        "Rev. Joseph Carter" <knghtbrd@earthlink.net>
        Richard Braakman <dark@debian.org>
        Raul Miller <rdm@test.legislate.com>

Errors, ommissions and any lack of clarity or forethought in the above
are of course mine.

Awaiting your comments.

Cheers,
aj

-- 
Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/>
I don't speak for anyone save myself. PGP encrypted mail preferred.

Remember to breathe.

Attachment: pgpfZxFoiaKFj.pgp
Description: PGP signature


Reply to: