Re: Idea: frontend tool for more efficient license reviewing based on tree-structured IR
On Friday, 27 December 2019 02:56:07 CET Mo Zhou wrote:
> https://wiki.debian.org/CopyrightReviewTools
>
> I'm unfamiliar with most of them. I'm only describing the two I'm familiar
> with. Both licensecheck (Jonas) and debmake (Osamu) do template/regex
> matching.
I'd suggest you to look at 'cme' as this tool is closer to what you want to
achieve. For details, please see https://github.com/dod38fr/config-model/wiki/
Updating-debian-copyright-file-with-cme
> * Tree structure is always missing. after importing a new upstream release
> with significant directory layout change, it will be inconvenient to
> locate the parts of debian/copyright should be updated. Things will become
> more complex when new licenses/copyrights emerged.
This use case is handled by 'cme update dpkg-copyright' which merges
information from current debian/copyright with the information provided by the
new release.
> * licensecheck dumps garbage when it encounters a binary file, e.g. PNG
> image. This is not a BUG, as ftp-masters indeed checks the possible
> metadata in a binary file to make sure whether there is extra
> copyright/license info. But this is something needs to be improved...
As cme relies of licensecheck, this is also a sore point which can be
allieciated by tuning debian/fill.copyright.blanks.yml (see URL above for
details)
> The core of my idea is a tree-structured intermediate representation (IR)
> for the "license reviewing tree". The IR is basically a directory tree with
> annotations on the file nodes. The IR can be stored as a, say, JSON file.
This IR tree is constructed by 'cme' when processing licensecheck information,
but is not saved after debian/copyright is updated.
> To build such an tree-shaped IR, we need a couple of "backend" tools for
> checking the copyright & license info for a SINGLE file. Such "backend"
> includes but not limited to:
>
> * `licensecheck`. Given a file FILE, `licensecheck FILE` produces the
> license name.
> * `grep` or `ripgrep`. For example, `rg -i copyright FILE` always works
> well. * "neighbor". For example, given a source file "F/I/L/E" without any
> copyright & license info, looking for F/I/L/LICENSE, F/I/LICENSE, ..., etc
> like git does for the ".git" directory will help.
cme also infers "global" license information from README, LICENSE files to use
as default values when no specific license information is found in files.
> The formated+filtered output of any combination of these backends can be
> attached to the corresponding IR.
>
> In contrast, a "frontend" tool is also needed for dealing with such IR
> in a higher level. My imagined "frontend" tool is a `ranger`-like file
> browser with specific designs.
cme provides a frontend to debian/copyright structure (shown as a tree in this
frontend) with 'cme edit dpkg-copyright'
See https://github.com/dod38fr/config-model/wiki/Managing-Debian-packages-with-cme#maintaining-debian-copyright-file
> * the user can choose what backend(s) to use. If none is chosen, the
> frontend tool falls back into a general file browser with a preview panel.
cme only uses licensecheck backend. I did not see a need to try another tool
to extract license information from files.
> How to proceed
> --------------
>
> * a group of interested contributors.
If needed, I'm willing to tweak cme (*) so you can re-use cme (or parts
thereof) for your project.
All the best
(*) Actually, the code that handles copyright information is in libconfig-
model-dpkg-perl, which is a cme plugin to handle dpkg files:
https://salsa.debian.org/perl-team/modules/packages/libconfig-model-dpkg-perl
Reply to: