[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1008130: lintian: support/use multi-threads (currently single threaded and slow)



Hi Samuel,

On Tue, Mar 22, 2022 at 3:15 PM Samuel Henrique <samueloph@debian.org> wrote:
>
> I believe there could be noticeable performance gains from using all
> the threads available.

I share your hope and have implemented two attempts to parallelize the
~300 or so checks.

My first attempt used IO::Async but failed. That module is probably
the best one currently available, but it replaces the SIGCHLD handler.
Lintian uses dozens of other modules that call external programs via
other means. Unfortunately, those do not interact well with IO::Async,
which causes the parallel execution to freeze or otherwise experience
strange bugs.

A particularly serious problem for Lintian was the interaction with
Path::Tiny. [1]

You may be able to find some details by searching the Git log for
"Heisenbug" (capital H, please).

My current implementation uses MCE [2] which works okay, but does not
yet yield the performance gains you and I are hoping for. That is why
the experimental branch has not been merged.

As far as I can tell, the degradation relates to the serializations
Perl performs between parent and child processes. It is possible to
"close" on the in-memory file indexes as part of the fork() but it's
not enough to explain the difference. (The indexes are large and also
being transitioned to disk for unrelated reasons.) Memory usage is
higher, as well.

I may have to implement better profiling before we make significant
progress. That is because at least half the time is spent generating
the file indexes, which require a different parallelization strategy
than the checks.

One long-term plan could be to have a data interchange format between
the parent and the child processes. It would also allow checks to be
written in other programming languages, such as Haskell, but I would
seek further community input before proceeding with anything like
that.

[1] https://github.com/dagolden/Path-Tiny/issues/224
[2] https://metacpan.org/pod/MCE

> Although I don't know how feasible that is with
> lintian+perl.

Perl performs surprisingly well for an interpreted language, but I am
not sure true "threading" works well. In Lintian, we use multiple
processes, if at all. That is how I interpreted your use of the word
"threads".

> Note that I didn't go all the way to debugging lintian to confirm it's
> single-threaded

You are right. For the purposes of your analysis, Lintian uses a single process.

Thank you for your valuable suggestions!

Kind regards,
Felix Lechner


Reply to: