[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#863068: lintian: Merge pages from same contributor (but different email) into one page



Package: lintian
Version: 2.5.50.3
Severity: wishlist

Many contributors have multiple emails (common case being "non-DD"
email and a "DD" email).  However, lintian.d.o generates a maintainer
page per unique email and not per contributor.

We can do this by exporting the following data from
contributors.debian.org[1].  Using this dataset we can merge multiple
emails into one contributor by comparing the "user.email" value.  If
two entries have the same "user.email" value, then they are related to
the same contributor.

Access to the dataset is a privacy concern, so:
 * The data set (and related log files) should preferably at most be readable by the
   lintian maintainers on lindsay.d.o
    - technically, lindsay is DD-only, but I would still feel better with 0750
      over a 0755 permission.
 * We should not present any data from the dataset except where this is already public and
   exposed by the current implementation.
   - Example: Currently, I don't use my nthykier@debian.org email for packaging, so that
     email must not appear on lintian.d.o even though the dataset lists it as an email
     associated to me.
   - On the flip side, lintian.d.o would expose niels@thykier.net as I use that for packaging
     (like it did previously).

In the short term, we can do manual exports of the data (for
prototyping/testing).  Long term, we should setup some sort of batch
job with a service account to pull this data.  The latter probably needs
DSA and/or maintainers of contributor.d.o

Thanks,
~Niels

[1] https://contributors.debian.org/api/identifiers/?format=json&type=email&limit=none

 * Warning: large data set - your browser/editor might not like it (omit "limit=no" if you are going
   to click on the link)
 * Access restrictions: DD-only (Privacy)
 * Authentification: SSL certificate (from sso.d.o)
 * Code: https://anonscm.debian.org/cgit/nm/dc.git/commit/?id=7c58b6ce8092fea3c6902dfe8a10428f7c4d1795
   - Plus some later commits
 * Example in below using "?user__email=nthykier%40debian.org&type=email" as filter
   - Data is about me, declassified by me, so its exposure is a non-concern.

JSON example:

{
    "count": 2,
    "next": null,
    "previous": null,
    "results": [
        {
            "type": "email",
            "name": "niels@thykier.net",
            "user": {
                "email": "nthykier@debian.org",
                "full_name": "Niels Thykier"
            }
        },
        {
            "type": "email",
            "name": "nthykier@debian.org",
            "user": {
                "email": "nthykier@debian.org",
                "full_name": "Niels Thykier"
            }
        }
    ]
}


Reply to: