Re: multi threaded support for xz
On 2016-10-07 13:17 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-10-06 23:47:05 [+0200], Guillem Jover wrote:
>
>> On Thu, 2016-10-06 at 08:30:53 +0200, Sebastian Andrzej Siewior wrote:
>> > With one CPU you have one block. With multiple CPUs the default block
>> > size (as of current xz) is dictionary size * three. So it is
>> > reproducible as long as you use one or multiple CPUs.
>> > In order to have the same compressed archive with one or multiple CPUs
>> > you would need a switch / environment variable to disable the use of
>> > multiple CPUs.
>>
>> Does this depend on the encoder interface being used? Because dpkg will
>> always use the lzma_stream_encoder_mt() call regardless of the number
>> of online CPUs compared to xz(1) which changes inerface on single or
>> multi-threaded mode. In any case I'll be testing the repoducibility
>> of this, and if need be check with xz upstream to get a more clear
>> picture (either that or perform some code diving :).
>
> No, not really. You don't specify the block_size parameter in filters so
> xz takes care of this. This means:
> - one CPU -> one block of all input data
> - two or more CPUs -> dict_size times three is the size of each block
Really? I don't think so.
> This is also what `xz -T1' vs `xz -T2' does (and as I said, -T2 vs -T8
> produces the same xz binary). You can see the resulting block sizes and
> number of blocks in "xz -lv".
That's because `xz -T1', unlike dpkg, does not use
lzma_stream_encoder_mt() but rather lzma_stream_encoder(). For testing
purposes, I set the number of threads in dpkg to 1:
--8<---------------cut here---------------start------------->8---
diff --git a/lib/dpkg/compress.c b/lib/dpkg/compress.c
index 2eda658..2b32d8a 100644
--- a/lib/dpkg/compress.c
+++ b/lib/dpkg/compress.c
@@ -531,7 +531,7 @@ filter_xz_init(struct io_lzma *io, lzma_stream *s)
#ifdef HAVE_LZMA_MT
lzma_mt mt_options = {
.flags = 0,
- .threads = sysconf(_SC_NPROCESSORS_ONLN),
+ .threads = 1,
.block_size = 0,
.timeout = 0,
.filters = NULL,
--8<---------------cut here---------------end--------------->8---
And the data.tar.xz files are still divided into 24 MiB blocks,
according to `xz -lv`.
Cheers,
Sven
Reply to: