[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SIGSEGV in T.mprotect_P 8



I'm working on LSB compliance for DCC, and am having a weird problem.
Research has indicated that others have encountered the problem as well,
so it sounds like a possible TSD, but I thought I'd get your opinion.

The test is /tset/LSB.os/mprotect/mprotect_P/T.mprotect_P 8, in
lsb-runtime-test.  The test hangs; you have to kill it to continue the
test, and it comes up UNREPORTED in the journal.  Attaching strace to
the running test after about 30 seconds shows an endless cascade of
SIGSEGVs.

I've collected a few references to the problem occurring occasionally on
Debian, on a few architectures on the LSB Sample Implementation, and
other random places.  In particular, this post by Mats:

http://mail.freestandards.org/pipermail/lsb-test/2004-July/002710.html

led me to figure out something of what's happening.

To see it, I ran the test manually under gdb, and set a breakpoint at
test8().  At that point, /proc/[pid]/maps looks something like this:

0805f000-08084000 rw-p 0805f000 00:00 0
40000000-40015000 r-xp 00000000 03:01 940407     /lib/lsb/ld-2.3.5.so
40015000-40016000 rw-p 00015000 03:01 940407     /lib/lsb/ld-2.3.5.so
40016000-40018000 rw-p 40016000 00:00 0
4001b000-40021000 r-xp 00000000 03:01 940422     /lib/lsb/librt-2.3.5.so
40021000-40022000 rw-p 00005000 03:01 940222     /lib/lsb/librt-2.3.5.so

(This is only a small piece in the middle; entries for the executable
appear before, and more libraries and other stuff appear after.)

When I continue, I get an immediate SIGSEGV, and gdb grabs control.  The
maps file in /proc now looks like this:

0805f000-08084000 rw-p 0805f000 00:00 0
40000000-40015000 r-xp 00000000 03:01 940407     /lib/lsb/ld-2.3.5.so
40015000-40016000 rw-p 00015000 03:01 940407     /lib/lsb/ld-2.3.5.so
40016000-40018000 rw-p 40016000 00:00 0
40018000-4001b000 r--s 00000000 03:01 356075     /home/tet/test_sets/TESTROOT/tset/LSB.os/mprotect/mprotect_P/d.mprotect_P/vsrt_mprotect
4001b000-40021000 r--p 00000000 03:01 940422     /lib/lsb/librt-2.3.5.so
40021000-40022000 r--p 00005000 03:01 940222     /lib/lsb/librt-2.3.5.so

Note the change of the permissions bits on the memory allocated to
librt.  The same change is apparent for all libraries except for the
dynamic linker.

What seems to be happening is that the test assumes that, when it opens
and mmaps the file, the memory will be at the end of the process's
memory space.  In this case, the assumption is incorrect, and the
allocated memory is right before the dynamic libraries.  The syscall
behind mprotect thus seems to succeed, in that the permissions described
in the call are set on the memory blocks requested, which includes
libc's address space.  Since the syscall is supposed to return to libc
code, which is now marked non-executable, the system immediately
segfaults, causing the signal mechanism (also in libc) to fire, which
triggers another segfault, etc.

I'm not confident in this analysis, though, since there are holes in it.
In particular, why doesn't it fail until the second mprotect call?

These conditions seem to apply consistently on the current development
version of DCC Core on the i386 platform with the stock Debian 2.6
kernel (package kernel-image-2.6.8-2-686).  This should be identical to
Debian 3.1 sarge on i386, running the above kernel, with the addition of
the dynamic linker hack packages available here:

http://apt-devel.componentizedlinux.org/linux/pool/main/g/glibc-lsb
http://apt-devel.componentizedlinux.org/linux/pool/main/p/pam-lsb

(The pam-lsb packages are probably not relevant, but are listed for
completeness.)

Upgrading to a kernel being developed for DCC, based on 2.6.12.5, causes
the test to succeed without incident.  From the looks of the other
reports, I don't think this reflects a real fix as much as an
environment difference that bumps memory allocation into doing what the
test expects.



Reply to: