Hello again world, (I guess I work on Jovian hours or something. My apologies) I'll break this proposal into a couple of moderately disjoint sections, primarily for ease of exposition but hopefully also for ease of comprehension. The first thing I'd like to ask you to consider, then, is converting the archive to a package pool and symlink tree structure. The Package Pool ================ 0. Introduction --------------- A new directory should be created in the /pub/debian hierarchy: package-pool. This directory should contain the physical copies of (not symlinks to) all the packages currently distributed by the Debian group. 1. Structure of the Package Pool -------------------------------- The directory hierarchy should be as follows: /pub/debian/package-pool main/ contrib/ non-free/ non-us/ -- where applicable disks/ [please see below for further explanation of the "where applicable" rider] Within each of package-pool/{main,contrib,non-free,non-us} should be a structure of the following form: /pub/debian/package-pool/main/ source/ admin/ base/ ... binary-all/ admin/ base/ ... binary-alpha/ admin/ base/ ... binary-i386/ ... ... (that is a "source" directory, and "binary-" directories for each architecture under development; each containing the usual single-level classification of packages) Some examples would probably be illustrative at this point. 1.1 Example: cruft ------------------ If we are currently making available version 0.9.6 of the package cruft for i386, and version 0.9.6.1 for alpha and m68k, we would have the following files available: /pub/debian/package-pool/main/ source/ admin/ cruft_0.9.6.tar.gz cruft_0.9.6.dsc cruft_0.9.6.1.tar.gz cruft_0.9.6.1.dsc binary-alpha/ admin/ cruft_0.9.6.1.deb binary-i386/ admin/ cruft_0.9.6.deb binary-m68k/ admin/ cruft_0.9.6.1.deb That is, we include the three compiled packages in their appropriate subdirectories, and include the sources for all available versions in the sources directory. 1.2 Example: distributed-net-pproxy ----------------------------------- The distributed.net personal proxy has two different versions currently available -- the one that made it into hamm, and the current one in slink. The former is available for the i386 and alpha architectures, the latter for i386 and sparc. /pub/debian/package-pool/non-free/ source/ misc/ distributed-net-pproxy_277b.orig.tar.gz distributed-net-pproxy_277b-1.dsc distributed-net-pproxy_277b-1.diff.gz distributed-net-pproxy_280-1.orig.tar.gz distributed-net-pproxy_280-1.dsc distributed-net-pproxy_280-1.diff.gz binary-alpha/ misc/ distributed-net-pproxy_277b-1.deb binary-i386/ misc/ distributed-net-pproxy_277b-1.deb distributed-net-pproxy_280-1.deb binary-sparc/ misc/ distributed-net-pproxy_280-1.deb 1.3 Example: the foobar package [fictitious] ------------------------------- Finally, we consider a slightly trickier example. The foobar package was released to bo at version 1.1-1, and rereleased to hamm at version 1.1-2 (a simple recompile to libc6). A couple of weeks later, however, a horrible bug is found, and an update has to be done for both the bo and hamm releases. I'll leave two options open here, one that seems mildly ugly but should work with minimum effort, and the other which I think is the Right Thing. Option 1: Release foobar_1.1-3 (libc5) for bo, and foobar_1.1-4 (libc6) for hamm. /pub/debian/package-pool/non-free/ source/ misc/ foobar_1.1.orig.tar.gz foobar_1.1-3.dsc foobar_1.1-3.diff.gz foobar_1.1-4.dsc foobar_1.1-4.diff.gz binary-i386/ misc/ foobar_1.1-3.deb foobar_1.1-4.deb Option 2: Release foobar_1.1-3 for both hamm and bo, compiling two separate packages, foobar_1.1-3_bo.deb (libc5 based) and foobar_1.1-3.deb (current, libc6 based). /pub/debian/package-pool/non-free/ source/ misc/ foobar_1.1.orig.tar.gz foobar_1.1-3.dsc foobar_1.1-3.diff.gz binary-i386/ misc/ foobar_1.1-3_bo.deb foobar_1.1-3.deb This has the advantage that it doesn't require the maintainer to think about old versions in advance -- s?he can just compile for hamm, upload, and belatedly think "Ooo! A bo upload would've been good too", and switch to a bo machine, recompile and reupload. No mucking about with version numbers required. Further, this does not need to be thought about by the maintainer at all -- someone following the stable release can simply download the sources, and upload a bo-only version. 2. The disks/ directory ----------------------- In order to make the bootdisk images more conveniently located, they should be moved out of main/ and into their own directory. /pub/debian/package-pool/disks/ disks-alpha/ disks-i386/ disks-m68k/ The recommended layout is as follows: /pub/debian/package-pool/disks/disks-i386/ 2.0.10_1998-07-17/ [images] 2.0.10_1998-07-21/ [images] This mirrors the current layout of /pub/debian/dists/main/disks-* as it currently stands, minus the "current" link. 3. Changes to the dists/ hierarchy ---------------------------------- Having thus found a nice place to dump all out packages, we can thus alter the current policy of having the distributions contain either a copy of the package itself or a symlink to a copy of the package in a previous release (*breath*), to just including a symlink to the package in the package-pool. Continuing the first of the three examples above, we would thus have: /pub/debian/dists/slink/ main/ binary-alpha/ admin/ cruft_0.9.6.1.deb -> ../../../../../package-pool/ main/binary-alpha/admin/cruft_0.9.6.1.deb binary-i386/ admin/ cruft_0.9.6.deb -> ../../../../../package-pool/ main/binary-i386/admin/cruft_0.9.6.deb binary-m68k/ admin/ cruft_0.9.6.1.deb -> ../../../../../package-pool/ main/binary-m68k/admin/cruft_0.9.6.1.deb source/ admin/ cruft_0.9.6.tar.gz -> ../../../../../package-pool/ main/source/admin/cruft_0.9.6.tar.gz cruft_0.9.6.dsc -> ../../../../../package-pool/ main/source/admin/cruft_0.9.6.dsc cruft_0.9.6.1.tar.gz -> ../../../../../package-pool/ main/source/admin/cruft_0.9.6.1.tar.gz cruft_0.9.6.1.dsc -> ../../../../../package-pool/ main/source/admin/cruft_0.9.6.1.dsc Bootdisks are stored as a symlink to the appropriate subdirectory of the package-pool. /pub/debian/dists/slink/ disks/ disks-i386 -> ../../../package-pool/disks/disks-i386/ 2.0.10_1998-07-21 4. Procedures ------------- There are a number of operations that can be performed on the package-pool: Developer Uploads: * Make a new version of a package available -- Upload foo_xy.zz.orig.tar.gz, foo_xy.zz-y.diff.gz, foo_xy.zz-y.dsc, and foo_xy.zz-y_arch.deb * Make a new port of a package available -- Upload foo_xy.zz-y_arch.deb (possibly: * Make a port to an old release available -- Upload foo_xy.zz-y_arch_release.deb ) Old Version Control: (eg, when 2.0-1 is available, so 1.3-7 goes bye-bye) * Make a particular port unavailable * Make a particular Debian revision unavailable * Make a particular upstream revision unavailable Some sample use scenarios are: * A new version of an i386 package is uploaded, making the old one irrelevent: remove the i386 port, but leave the source for the old version so the alpha and powerpc ports can be rebuilt. * A new version of an m68k package is uploaded, making the old one irrelevent: remove the m68k port, and note that all the other architectures have also moved on, so go a bit further and remove the Debian source, and the upstream source too. * Someone notes that a particular release of the upstream source of a package is released under a license with the clause "Debian GNU/Linux may not distribute any part of this work." We get rid of the upstream source, any .diff.gz's we may have made, and the .deb's we've compiled. [determining which versions of which packages need to be kept available will be dealt with in a later message] 5. Authority ------------ The package pool should be controlled by the ftp maintainer, and the people processing the Incoming queue. In particular, processing the Incoming queue should become as automated as possible [nb: I do not know enough about this particular job to say whether or not any further automation is possible. I'm happy to jump on the bandwagon and say that completely automated updates would be nice, however. More on this later]. Given this, the difficulties associated with release management become a matter of updating symlinks in the dists/ hierarchy. 6. Miscellaenous Considerations ------------------------------- 6.1 Coping with non-us ---------------------- We would like to integrate non-us into the distribution as well as we possibly can -- Debian is primarily about a free GNU/Linux distribution, not a free GNU/Linux distribution that follows US laws. This gives us two conflicting desires -- to make non-us packages just another part of the distribution for non-us users; and to make non-us packages completely separate for US mirrors and users. The above proposal aims at a compromise targetted more at the non-us users than is currently implemented, and thus requires one of: * a non-us master that includes the entire archive as listed above, and a US master mirror that mirrors everything but non-us, and is the favoured machine from which US based mirrors should mirror. or * a US master, that includes the entire archive except for non-us, and a non-us master that mirrors master's copy of the main distribution, and allows and installs uploads to non-us. I would tend toward the former, since it makes it convenient to synchronise the release of each different category (main, contrib, non-us and non-free). 6.2 Unreleased architectures ---------------------------- Packages in the current "sid" distribution would be made a part of the Package Pool, and would simply be symlinked from the dists/ hierarchy. Removing an architecture from release would thus involve deleting (or moving) a symlink tree which is relatively light on mirrors. [More on this later] 6.3 The Package Pool and GNU/HURD --------------------------------- We do have a possible problem, however, with a single package-pool like this. In particular, what do we do with HURD packages? If we can build all our HURD packages from the same source as our GNU/Linux packages, we're fine -- the HURD release is no different to the alpha or i386 releases. If, however, we decide we would prefer to keep the sources separate (to reflect the different filesystem layout, different standard base system or something else) then the HURD does not fit into the above layout -- at a minimum there would need to be two source directories. There are two possible options here: one is to say "To heck with it, we're talking about Debian GNU/Linux, Debian GNU/HURD should be *completely* separate, in /pub/debian-hurd, say." The other is to say "Well, look, we're probably going to support other kernels sometime on our way to World Conquest! anyway, so why don't we just accept it and integrate them in much the same way as we've already done the various ports?". We could then change the package-pool layout into something resembling: package-pool/ main/ linux/ source/ binary-alpha/ binary-i386/ binary-m68k/ hurd/ source/ binary-i386/ binary-sparc/ freebsd/ ... 6.4 License Changes ------------------- Something else that needs to be taken into consideration is what to do when a package changes classification: ncftp gets rereleased under GPL; Qt gets GPLed and KDE moves into main; the US export laws are changed and GPG can go into main. (in decreasing order of liklihood, I suppose) As presented, this requires manual intervention by both the archive and release managers -- packages in the package-pool need to be moved around, and the symlinks from the dists/ hierarchy need to be changed to point to the right place (contrib/.../kdebase -> .../contrib/.../kdebase becomes main/.../kdebase -> .../main/.../kdebase) This is significantly painful. Any resolution requires, at a minimum, removing the categorisation of the package-pool, and having a single tree for main, contrib, non-free, and non-us, with packages from the different categorisations sitting side by side, in some sort of liberated harmony. We could still ensure that only the appropriate packages were put in the appropriate areas with moderately complicated scripting (if you're a US mirror, download this list of files, the download all the files in that list *and nothing else*!!), which could be made to work acceptably (especially with a US and a non-us master mirror. 6.5 Past Revisions ------------------ In order to make life a little easier when release critical bugs are found, it would be nice to be able to revert back to older source. This is difficult at the moment, as old revisions are deleted immediately upon installing a new revision. With the package-pool, however, we can conveniently keep as many old revisions as we may wish. The question remains which old revisions to keep, and which to delete. The suggested mechanism for this is a form of LRU algorithm, whereby packages that have been stable for sometime are kept for longer than packages that were replaced a couple of days after their first upload. Further this algorithm should be based on the amount of disk space in use, so that we may say "The package pool takes up xGB. And it's guaranteed to stay that way for the next year or so, even as we add more architectures and packages". 6.6 Load on mirrors ------------------- The proposal contains no considerations as to how the changeover should be managed to minimise load on the mirrors. Doing funky stuff with symlinks remains a possibility. Simply making public announcements that Debian is about to undergo a major archive rearrangement and that mirrors may wish to watch their bandwidth would also be possible, and would allow us to make a clean start. Taking the longer term viewpoint, however, we should ask ourselves what sort of load having a package-pool is likely to put on mirrors. It already has a number of benefits: packages never have to be moved, even if they stay exactly the same years after their first release has been rm-rf'ed, the package-pool directory can be mounted on another disk (even a WORM disk) quite happily where there is more space, making the remainder of /pub/debian significantly smaller. (modulo CD images) This does, however, leave those who only wish to do a partial mirror somewhat out in the cold. Since it seems likely that the package-pool will contain additional old revisions (of both source code, and known stable versions of packages) this may become increasingly important. The only real possibilities seem to be, however, ignoring symlinks completely (and thus getting duplicated packages between releases), or doing some complicated scripting on another mirror, and dereferencing some symlinks, and retargeting others. Neither seems particularly optimal, but it at least seems that a reasonable solution should be possible. 7. Benefits ----------- So the important question is what does all this buy us? First, it provides a clear separation between release management and archive management -- deleting packages and uploading packages is a simple matter of sticking them in the package pool; releasing packages or removing them from release, or putting them in a different release, is a simple matter of fiddling with a symlink. Further, changes to the release (eg, a release expiring and all the files under dists/foo being deleted) do not affect the availability of packages in other distributions. Second, it gives us somewhere to store past revisions of packages. This is useful if you're trying to find a working version of a package in the lag time between a bug's discovery and its fix. It's also useful around release time, when a package may need to be replaced by an older version if a quick fix cannot be made in time. Finally, it allows us to set an upper limit on the size of the Debian archive, and be able to keep to that limit for some time. 8. Acknowledgements ------------------- This proposal was initially raised by Bdale Garbee <bdale@gag.com> a couple of months ago on this list. Almost all the ideas presented above are his. This proposal has been refined by comments from: (at least some of :) Bill Mitchell <debian@pny-fmail.webquest.com> Bdale Garbee <bdale@gag.com> Brian White <bcwhite@debian.org> Craig Sanders <cas@taz.net.au> Dale Scheetz <dwarf@polaris.net> David Engel <david@ods.com> Guy Maor <maor@ece.utexas.edu> Ian Jackson <ian@chiark.greenend.org.uk> Klee Dienes <klee@alum.mit.edu> Manoj Srivastava <srivasta@datasync.com> Philip Hands <phil@hands.com> "Rev. Joseph Carter" <knghtbrd@earthlink.net> Richard Braakman <dark@debian.org> Raul Miller <rdm@test.legislate.com> Errors, ommissions and any lack of clarity or forethought in the above are of course mine. Awaiting your comments. Cheers, aj -- Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. PGP encrypted mail preferred. Remember to breathe.
Attachment:
pgpfZxFoiaKFj.pgp
Description: PGP signature