[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Idea: frontend tool for more efficient license reviewing based on tree-structured IR



On Friday, 27 December 2019 02:56:07 CET Mo Zhou wrote:
> https://wiki.debian.org/CopyrightReviewTools
> 
> I'm unfamiliar with most of them. I'm only describing the two I'm familiar
> with.  Both licensecheck (Jonas) and debmake (Osamu) do template/regex
> matching.

I'd suggest you to look at 'cme' as this tool is closer to what you want to 
achieve. For details, please see https://github.com/dod38fr/config-model/wiki/
Updating-debian-copyright-file-with-cme

> * Tree structure is always missing. after importing a new upstream release
>   with significant directory layout change, it will be inconvenient to
>   locate the parts of debian/copyright should be updated. Things will become
> more complex when new licenses/copyrights emerged.

This use case is handled by 'cme update dpkg-copyright' which merges 
information from current debian/copyright with the information provided by the 
new release.

> * licensecheck dumps garbage when it encounters a binary file, e.g. PNG
> image. This is not a BUG, as ftp-masters indeed checks the possible
> metadata in a binary file to make sure whether there is extra
> copyright/license info. But this is something needs to be improved...

As cme relies of licensecheck, this is also a sore point which can be 
allieciated by tuning debian/fill.copyright.blanks.yml (see URL above for 
details)

> The core of my idea is a tree-structured intermediate representation (IR)
> for the "license reviewing tree". The IR is basically a directory tree with
> annotations on the file nodes. The IR can be stored as a, say, JSON file.

This IR tree is constructed by 'cme' when processing licensecheck information, 
but is not saved after debian/copyright is updated.

> To build such an tree-shaped IR, we need a couple of "backend" tools for
> checking the copyright & license info for a SINGLE file. Such "backend"
> includes but not limited to:
> 
>  * `licensecheck`. Given a file FILE, `licensecheck FILE` produces the
> license name.
>  * `grep` or `ripgrep`. For example, `rg -i copyright FILE` always works
> well. * "neighbor". For example, given a source file "F/I/L/E" without any
> copyright & license info, looking for F/I/L/LICENSE, F/I/LICENSE, ..., etc
> like git does for the ".git" directory will help.

cme also infers "global" license information from README, LICENSE files to use 
as default values when no specific license information is found in files.

> The formated+filtered output of any combination of these backends can be
> attached to the corresponding IR.
> 
> In contrast, a "frontend" tool is also needed for dealing with such IR
> in a higher level. My imagined "frontend" tool is a `ranger`-like file
> browser with specific designs.

cme provides a frontend to debian/copyright structure (shown as a tree in this 
frontend) with 'cme edit dpkg-copyright'

See https://github.com/dod38fr/config-model/wiki/Managing-Debian-packages-with-cme#maintaining-debian-copyright-file

>  * the user can choose what backend(s) to use. If none is chosen, the
> frontend tool falls back into a general file browser with a preview panel.

cme only uses licensecheck backend. I did not see a need to try another tool 
to extract license information from files.

> How to proceed
> --------------
> 
> * a group of interested contributors.

If needed, I'm willing to tweak cme (*) so you can re-use cme (or parts 
thereof) for  your project.  

All the best

(*) Actually, the code that handles copyright information is in libconfig-
model-dpkg-perl, which is a cme plugin to handle dpkg files:
https://salsa.debian.org/perl-team/modules/packages/libconfig-model-dpkg-perl






Reply to: