[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: format 0.939000 for breaking the 9.3GB barrier



Hi!

On Sat, 2023-09-16 at 16:07:37 +0200, Adam Borowski wrote:
> And we're closer and closer to get there: the last time we spoke, max
> package size was 1.7GB, it's 5.5GB today.  In fact, judging by
> Installed-Size alone, some other packages would already breach this limit,
> had they shipped the data instead of fetching it from the Interwebs.
> 
> The current max is kicad-packages3d_7.0.7-1_all.deb with data.tar
> 5839452160 bytes in size.

While I think this should be solved, I don't think this is a pressing
matter as it seems, because this really only affects binary packages
that contain a single file that when compressed exceeds that limit,
otherwise the packages can be split normally.

> There were suggestions of other package format, but to my knowledge none
> have been implemented or even researched.  Which leaves the deb-old format
> (version 0.939000; I'll round it to "1" hereafter).

This is really a non-starter.

> As far as I know, format 2.0 was devised with some undescribed extensions
> in mind; none of those extensions has appeared during 28 years since we
> made the switch -- any new stuff has gone into control.tar instead.

Yes they have, deb signatures use that, the tdeb specification (that
never got very far) also uses that. There could be custom extensions
around too.

> Thus, I propose we revert to the old format.
> 
> Benefits:
>  * no 10¹⁰ data.tar limit
>  * it unpacks 1% faster than 2.0

Hmm, thanks, that's actually a bug in dpkg-deb, which I've now fixed
locally, as the old format is supposed to only use gzip, and not the
default xz.

> Concerns:
>  * no support for compressors other than cat/ncompress/gzip yet
>  * external tools may not know it

AFAIK, no external tools except for dpkg-deb itself supports it, not
even file(1). Thus its coverage is extremely poor.

> As for external tools, those that properly call dpkg will work out of the
> box, this is fortunately most of them.  The rest would need to grow
> such support, I haven't done that research yet.

> So, before any of us commits more effort, please say if this is the way
> to go.

While ar has its set of limitations:

  - Might diverge format depending on the system (AIX small and big
    formats).
  - File size limitation.
  - Filename length limitation (not relevant for .deb:s though),
    (which could be overcome with the BSD or GNU variants).

the BSD and GNU variants have very wide support in many libraries and
languages. It is also extensible and quite compact.

While I've had this problem in mind and pondered over various ideas,
I think the better option is to use sliced data parts within an ar
container. Using tar-in-tar seems like a waste due to the 512-blocks
padding, and using other custom formats means having to do special
custom handling in other tools, and makes handling this with basic
tools extremely cumbersome.

Such "new format" could simply reuse the ar extensibility and it
would actually be rather simple, and only require slicing the
data.tar.COMP into pieces that then need to be reassembled. This means
that «dpkg-deb --fsys-tarfile» would work transparently, and that
handling such .deb files by hand would be trivial with cat and dd.

I think this could be a new format similar to split packages but in a
single .deb, with something perhaps like:

  ,---
  $ ar tv pkg-lfs_1.0_arch.deb
  debian-binslice
  control.tar.xz
  data-01.tar.xz
  data-02.tar.xz
  data-03.tar.xz
  $ ar p pkg-lfs_1.0_arch.deb debian-binslice
  1.0
  3
  `---

(or perhaps just «debian-sliced».)

I guess perhaps one problem is that it segregates the format, and it
might mean its support might end up being poor as well, more so if
there are no actual such binary packages in the wild. The other more
intrusive option would be to make a 3.0 format that includes something
like this by default, so that then there's a single thing to support
for everything. Say:

  ,---
  $ ar tv pkg-small_3.0_arch.deb
  debian-binary
  meta.tar.xz
  fsys-01.tar.xz
  $ ar p pkg-small_3.0_arch.deb debian-binary
  3.0
  1
  `---

(Even though I find the -01 there annoying, but that would make the
format uniform regardless of the slices.)

  ,---
  $ ar tv pkg-lfs_3.0_arch.deb
  debian-binary
  meta.tar.xz
  fsys-01.tar.xz
  fsys-02.tar.xz
  fsys-03.tar.xz
  $ ar p pkg-lfs_3.0_arch.deb debian-binary
  3.0
  3
  `---

But this seems too much disruption, for something I'd expect would not
be used widely anyway.

Thanks,
Guillem


Reply to: