Bug#738342: lintian: checks/cruft - GFDL check is slow

To: Niels Thykier <niels@thykier.net>
Cc: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#738342: lintian: checks/cruft - GFDL check is slow
From: Bastien ROUCARIES <roucaries.bastien@gmail.com>
Date: Mon, 10 Feb 2014 13:30:22 +0100
Message-id: <[🔎] CAE2SPAZgGpoBYS=7DDdPVfGz7RPGTnMs6Wge2YzPwJvWWEeBgQ@mail.gmail.com>
Reply-to: Bastien ROUCARIES <roucaries.bastien@gmail.com>, 738342@bugs.debian.org
In-reply-to: <[🔎] 20140209125159.5522.15233.reportbug@mangetsu.thykier.net>
References: <[🔎] 20140209125159.5522.15233.reportbug@mangetsu.thykier.net>

Le 9 févr. 2014 13:54, "Niels Thykier" <niels@thykier.net> a écrit :
>
> Package: lintian
> Version: 2.5.21
> Severity: normal
>
> A quick benchmark suggests that lintian spends nearly 2 minutes on the
> Linux source package (I tested with linux/3.10~rc7-1~exp1). Profiling
> Lintian with perl -d:NYTProf suggests that the vast majority of the time
> is spent in:
>
> """
> if ($cleanedblock =~ $gfdlpattern) {
> """
>
> Where $gfdlpattern is one of:
>
> """
> # classical gfdl matching pattern
> my $normalgfdlpattern = qr/
> (?'contextbefore'(?:
> (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
> (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)))
> gnu \s+ free \s+ documentation \s+ license
> (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ license).){0,1024}?)
> a \s+ copy \s+ of \s+ the \s+ license \s+ is
> /xsmo;
>
> # for first block we get context from the beginning
> my $firstblockgfdlpattern = qr/
> (?'rawcontextbefore'(?:
> (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
> \A(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){0,1024}|
> (?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)
> )
> )
> gnu \s+ free \s+ documentation \s+ license
> (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ license).){0,1024}?)
> a \s+ copy \s+ of \s+ the \s+ license \s+ is
> /xsmo;
> """
>
>
> The profiler suggests that 60% of the runtime is spent in the
> "CORE:match" operations inside "license_check" from c/cruft. The
> regex appeas to be hit "only" 2452 times, but it spends an average of
> 55.9ms per time totalling 137s.
>
> Bastian, do you have an ideas for reducing the cost of the regex?

Yes I have.

Use these regexp only if we could match gnu free documentation license

Bastien
>
> ~Niels
>

Reply to:

References:
- Bug#738342: lintian: checks/cruft - GFDL check is slow
  - From: Niels Thykier <niels@thykier.net>

Prev by Date: [lintian] branch master updated (50ee454 -> 6e907ad)
Next by Date: Processed: limit source to lintian, tagging 738342, tagging 612610
Previous by thread: Bug#738342: lintian: checks/cruft - GFDL check is slow
Next by thread: Bug#738348: lintian: Please provide sorting by number of tags or overrides
Index(es):
- Date
- Thread