Packagability of crates with separate test data

To: debian-rust@lists.debian.org
Subject: Packagability of crates with separate test data
From: Andreas Molzer <andreas.molzer@gmx.de>
Date: Sun, 10 Apr 2022 00:43:47 +0200
Message-id: <[🔎] YlIMI+YikeFqI82O@Chimera>

Hi,

As a crate author/maintainer, I'm wondering how I could improve my crate
organization to integrate with your CI/packaging setup. Basically, I got
this Issue report (non-actionable, informative) regarding rust-weezl:

<https://github.com/image-rs/lzw/issues/29>

> Unit tests can't be run from the crate downloaded from crates.io
> 
> […] The unit tests depend on a file named /benches/binary-8-msb.lzw
> that isn't included in the crate uploaded to crates.io.

Similar issues are obvious for `rust-png` and `rust-image`. The
underlying problem appears to be an apparent conflict between the
largely automated policy for packaging and crates.io archives. A little
more in-depth:

Speaking as a crate author, the artifacts to crates.io are mainly geared
towards consumption as a cargo dependency. For this reason, I strive to
make them as small as possible, with no dev-/test data. (We, image-rs,
had accidentally published ~1MB once in image-tiff and got issue report
for that within the day..). In any case, <crates.io> has a hard limit of
10MB. For this reason, it does not seem reasonable to mention test data
in Cargo.toml, even through such mechanisms as specifying an additional
dev-dependency.

Anyways, thinking about the issue I wanted to offer a potential
solution. What about adding a dev-dependency that ensures the proper
data exists, and then loading test data either out-of-band or
dynamically? So an idea for a crate was born:
	xtest-data: <https://crates.io/crates/xtest-data/1.0.0-beta.2>

The basis was that downloading data over network is not desirable. Over
time it was clear that data should be available as an archive. So, the
package morphed into automation to create and load minimal git
pack-files that contain a shallow, and sparse, archive of the test data.
This makes it possible to very easily publish test data as a separate
release artifact, for example via CI/CD Actions/… See the documentation
of xtest-data to see it exemplified:

	<https://github.com/HeroicKatora/xtest-data#how-to-use-offline>

So, I'm left wondering if this approach resonates with some of you. Does
this simplify any steps for packaging? What could be done to improve
this further? For instance, `Cargo.toml` allows some arbitrary metadata
fields. <docs.rs/about/metadata> utilizes this quite effectively. Would
it be any help if a reference to this additional test data archive to
the Cargo.toml file (which would appear in the crate), and if so, in
what form?

Feel free to contribute via answers on the mailing list, or opening
concrete proposals on the repository.

Greetings, and pleasant day,
Andreas

Reply to:

Follow-Ups:
- Re: Packagability of crates with separate test data
  - From: Fabian Grünbichler <f.gruenbichler@proxmox.com>

Next by Date: Another workflow for Rust Was: Rust libraries left broken
Next by thread: Re: Packagability of crates with separate test data
Index(es):
- Date
- Thread