[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1040716: libc6: crashes with SIGBUS in rrd_write



control: reassign -1 librrd8
control: retitle -1 librrd8: crashes with SIGBUS in rrd_write

Hi,

On 2023-07-11 15:53, Tim McConnell wrote:
> 
> 
> On Tue, 2023-07-11 at 22:34 +0200, Aurelien Jarno wrote:
> > Hi,
> > 
> > On 2023-07-11 15:28, Tim McConnell wrote:
> > > 
> > > 
> > > On Tue, 2023-07-11 at 21:11 +0200, Aurelien Jarno wrote:
> > > > Hi,
> > > > 
> > > > On 2023-07-11 11:21, Tim McConnell wrote:
> > > > > 
> > > > > 
> > > > > On Mon, 2023-07-10 at 23:17 +0200, Aurelien Jarno wrote:
> > > > > > You might want
> > > > > > to upgrade to version 2.37-5 to check if it solves your issue
> > > > > Okay that's done and it's still doing it. The entry from
> > > > > Journalctl
> > > > > shows module libudev1 if that's of any use. 
> > > > >  
> > > > > Started systemd-coredump@1785-616863-0.service - Process Core
> > > > > Dump
> > > > > (PID
> > > > > 616863/UID 0).
> > > > > Jul 11 10:52:06 DebianTim systemd-coredump[616865]: Process
> > > > > 616847
> > > > > (collectd) of user 0 dumped core.
> > > > >                                                     
> > > > >                                                     Module
> > > > > libudev.so.1
> > > > > from deb systemd-252.11-1.amd64
> > > > >                                                     Stack trace
> > > > > of
> > > > > thread 616848:
> > > > >                                                     #0 
> > > > > 0x00007fce1335e9f2 __memmove_ssse3 (libc.so.6 + 0x16d9f2)
> > > > >                                                     #1 
> > > > > 0x00007fce131156d9 rrd_write (librrd.so.8 + 0x346d9)
> > > > >                                                     #2 
> > > > > 0x00007fce13120acd n/a (librrd.so.8 + 0x3facd)
> > > > >                                                     #3 
> > > > > 0x00007fce13122962 n/a (librrd.so.8 + 0x41962)
> > > > >                                                     #4 
> > > > > 0x00007fce1317c370 n/a (rrdtool.so + 0x3370)
> > > > >                                                     #5 
> > > > > 0x00007fce132793ec start_thread (libc.so.6 + 0x883ec)
> > > > >                                                     #6 
> > > > > 0x00007fce132f9a1c __clone3 (libc.so.6 + 0x108a1c)
> > > > >                                                     
> > > > 
> > > > Thanks for the details. This shows that the binary crashing
> > > > regularly
> > > > is
> > > > collectd. This is very unlikely that the issue is linked to the
> > > > locales,
> > > > and your test is confirming that.
> > > > 
> > > > It's also not clear that it's a glibc issue, it's more likely an
> > > > issue
> > > > in collectd or librrd8. It appears that systemd-coredump saved a
> > > > coredump when the process crashed. You should be able do use
> > > > "coredumpctl" to get the list of cores. You can select one
> > > > coredump
> > > > and
> > > > examine it with gdb using "coredumpctl debug xxxx". Then when
> > > > under
> > > > gdb
> > > > you should be able to run "thread apply all bt" to get the
> > > > backtrace.
> > > > That should allows to better understand the issue.
> > > > 
> > > > Regards
> > > > Aurelien
> > > > 
> > > I'm unsure how helpful this is ( I am not a programmer) but: 
> > > thread apply all bt
> > 
> > Thanks, that's already much more useful. However I forgot to tell you
> > to
> > install the libc6-dbg package before getting the backtrace, sorry
> > about
> > that. Could you please install it and follow the same procedure
> > again?
> > 
> > Thanks
> > Aurelien
> > 
> It's already there, any others? 

I was expecting to see the arguments of the glibc functions, but anyway
I have been able to understand the code path.

collectd uses the librrd8 library to handle data, which at some point
calls the glibc memmove function. The crashes (SIGBUS) happens in the
memmove function when trying to destination buffer:

|         .p2align 4,, 4
| L(copy_0_15):
|         cmpl    $4, %edx
|         jb      L(copy_0_3)
|         cmpl    $8, %edx
|         jb      L(copy_4_7)
|         movq    0(%rsi), %rcx
|         movq    -8(%rsi, %rdx), %rsi
 =>       movq    %rcx, 0(%rdi)
|         movq    %rsi, -8(%rdi, %rdx)
|         ret

memmove is called in
the rrd_write function, and more precisely in the memmove call:

| ssize_t rrd_write(
|     rrd_file_t *rrd_file,
|     const void *buf,
|     size_t count)
| {

[...]

| #ifdef HAVE_MMAP
|     size_t    old_size = rrd_file->file_len;
| 
|     if (count == 0)
|         return 0;
|     if (buf == NULL)
|         return -1;      /* EINVAL */
| 
|     if ((rrd_file->pos + count) > old_size) {
|         rrd_set_error
|             ("attempting to write beyond end of file (%ld + %ld > %ld)",
|              rrd_file->pos, count, old_size);
|         return -1;
|     }
|     /* can't use memcpy since the areas overlap when tuning */
=>    memmove(rrd_simple_file->file_start + rrd_file->pos, buf, count);
|     rrd_file->pos += count;
|     return count;       /* mimic write() semantics */

librrd8 uses mmap to open the file, and then does a memmove on the
corresponding memory mapping. The fact it crashes with SIGBUS means that
a memory access is attempted beyond the end of the mapped file.

Therefore this is not an issue in glibc, but could be due to a corrupted
RRD file or a bug in the librrd8 library. Depending on the data you have
in the RRD files, you might want to recreate them to see if the issue
goes away or not.

In any case, I am reassigning the bug to the librrd8 package. The
librrd8 maintainers might have a better idea how to debug that further.

Regards
Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                     http://aurel32.net


Reply to: