[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Validating tarballs against git repositories



On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
> Running ./bootstrap in a tarball may lead to different results than the
> maintainer running ./bootstrap in pristine git.  It is the same problem
> as running 'autoreconf -fvi' in a tarball does not necessarily lead to
> the same result as the maintainer running 'autoreconf -fvi' from
> pristine git.  The different is what is pulled in from the system
> environment.  Neither tool was designed to be run from within a tarball,
> so this is just bad practice that never worked reliable and without a
> lot of complexity it will likely not become reliable either.

The practice of running "autoreconf -fi" or similar via dh-autoreconf
has worked extremely well at scale in Debian.  I'm sure there are
complex edge cases where it's caused problems, but it's far from being a
disaster area.

I don't think running ./bootstrap can be generalized as easily as
running autoreconf can, and it's definitely going to be tough to apply
to all packages that use gnulib; but I think the blanket statement that
it's bad practice is painting with too broad a brush.  For the packages
where I've applied it so far (most of which I'm upstream for,
admittedly), it's fine.

> I have suggested before that upstream's (myself included) should publish
> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
> checkout including submodules,

A while back I contributed support to Gnulib's bootstrap script to allow
pinning particular commits without using submodules.  I would recommend
this mode; submodules have very strange UI.

> *.po translations,

As I noted in a comment on your blog, I think there is a case to be made
for .po files being committed to upstream git, and I'm not fond of the
practice of pulling them in only at bootstrap time (although I can
understand why that's come to be popular as a result of limited
maintainer time).  I have several reasons to believe this:

 * There at least used to be some edge cases where format string
   mismatches aren't caught by the gettext toolchain.  I've forgotten
   the details, but I remember running into one case where this turned
   into at least a translation-induced crash if not a security
   vulnerability.

 * Like just about everyone, translators make mistakes.  Since they're
   often working with technical text across a wide variety of domains,
   my experience is that they're more likely to make mistakes when
   dealing with package-specific terms, and these are often left
   untranslated, which means that the maintainer is in a much better
   position to catch those mistakes than you might think.  I don't want
   to cast shade on anyone in particular, but I find that I catch
   mistakes in a significant fraction of man-db translation updates just
   by looking at the diff without having to understand the target
   language; for example, if I add an item to a list and also make some
   other nearby textual changes, it's quite common for translators to
   miss adding the item to the list, and I can spot that sort of thing
   almost regardless of the language.

 * Actively malicious translations are rare, but they do happen.
   https://discourse.ubuntu.com/t/announcement-ubuntu-desktop-23-10-release-image-translation-incident-now-resolved/39365
   was a recent case of this.  I seem to remember that when I tracked
   down the original files it was fairly obvious that the "translations"
   had nothing to do with the source strings even without understanding
   Ukrainian.

 * If you're faced with a user report containing translated messages,
   then it's much easier to figure out what's going on if you can just
   look for them in git.  I've found this to be a source of frustration
   on several occasions when dealing with packages where ./bootstrap
   pulls in translations.

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]


Reply to: