-
-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tst-mmap test fails when compiled with GCC 7 #967
Comments
I cannot reproduce this on my machine, with gcc 7.3.1 and without KVM ( Since init_library is a new function (with moved code), it is also possible that this is a regression. If you can easily reproduce this bug, can you please try it with a slightly version of OSv, before init_library() was introduced (for the go support)? By code inspection, I think I found a potential bug: In the past, the init_library() code was run by program::get_library() inside a SCOPE_LOCK(_mutex). |
Adding SCOPE_LOCK(_mutex) did not help. I added debug statements and it
seems it happens when loading the test app itself.
I noticed that tst-mmap.so (like some other 2 tests has special compilation
and linker flag - would it have anything to do with it?
```
# Tests with special compilation parameters needed...
$(out)/tests/tst-mmap.so: COMMON += -Wl,-z,now
$(out)/tests/tst-elf-permissions.so: COMMON += -Wl,-z,relro
# The following tests use special linker trickery which apprarently
# doesn't work as expected with GOLD linker, so we need to choose BFD.
# TODO: figure out why this workaround was needed (the reason may be
# different for each of these tests), and avoid this workaround!
$(out)/tests/tst-mmap.so: COMMON += -fuse-ld=bfd
$(out)/tests/tst-elf-permissions.so: COMMON += -fuse-ld=bfd
$(out)/tests/tst-tls.so: COMMON += -fuse-ld=bfd
```
I also tried to comment out this line:
```
#$(out)/tests/tst-mmap.so: COMMON += -fuse-ld=bfd
```
to make it use same linker. It did not help.
I wonder if somehow tst-mmap.so ELF happens to have a structure in this
case that is not properly handled by OSv ELF loader.
Here is extra output from running the tests:
```
Calling init_library for libvdso.so
In init_library
Before callin init_library from main() for /tests/tst-mmap.so
In init_library
--> arch_relocate_jump_slot: 7
page fault outside application, addr: 0x0000100001603f10
[registers]
RIP: 0x000000000039769e <elf::object::arch_relocate_jump_slot(unsigned int,
void*, long)+62>
RFL: 0x0000000000010206 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0x00000000004b3ec0 RBX: 0x0000000000000007 RCX: 0x000000000098d7d8
RDX: 0x0000000000000002
RSI: 0xffff8000019dff80 RDI: 0x00002000001ff9a0 RBP: 0x00002000001ff9d0
R8: 0xfffffffffffff8e0
R9: 0x0000000000c70598 R10: 0x0000000000000007 R11: 0x0000000000000009
R12: 0x0000100001603f10
R13: 0x0000000000c70598 R14: 0x0000000000000007 R15: 0x0000000000000009
RSP: 0x00002000001ff9a0
Aborted
```
When it succeeds for other test:
```
Calling init_library for libvdso.so
In init_library
Before callin init_library from main() for /tests/tst-mmap-file.so
In init_library
--> arch_relocate_jump_slot: b
--> arch_relocate_jump_slot: 5
PASS: open
--> arch_relocate_jump_slot: 6
PASS: ftruncate
--> arch_relocate_jump_slot: 4
--> arch_relocate_jump_slot: 9
PASS: write pattern to MAP_SHARED
PASS: verify pattern was written to file
PASS: write pattern to MAP_PRIVATE
PASS: verify pattern didn't change
PASS: write pattern to partial MAP_SHARED
PASS: verify pattern didn't change in unmapped part
PASS: verify pattern changed in mapped part
--> arch_relocate_jump_slot: 1
```
Here is tst-mmap.so ELF details:
```
Symbol table '.dynsym' contains 43 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
mprotect@GLIBC_2.2.5 (2)
2: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread5startEv
3: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZN5sched4cpusE
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZdlPv@GLIBCXX_3.4 (3)
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
__assert_fail@GLIBC_2.2.5 (2)
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 (3)
8: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND
_ZSt4cerr@GLIBCXX_3.4 (3)
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
malloc@GLIBC_2.2.5 (2)
10: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
__cxa_atexit@GLIBC_2.2.5 (2)
11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
aligned_alloc@GLIBC_2.16 (4)
12: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 (3)
13: 0000000000000000 0 NOTYPE WEAK DEFAULT UND
_ITM_deregisterTMCloneTab
14: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZStlsISt11char_traitsIcE@GLIBCXX_3.4 (3)
15: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6threadC1ESt8fun
16: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free@GLIBC_2.2.5
(2)
17: 0000000000000000 0 NOTYPE WEAK DEFAULT UND
_ITM_registerTMCloneTable
18: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched11preemptableEv
19: 0000000000000000 0 FUNC WEAK DEFAULT UND
__cxa_finalize@GLIBC_2.2.5 (2)
20: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND
_ZTVN10__cxxabiv117__clas@CXXABI_1.3 (5)
21: 0000000000000000 0 FUNC GLOBAL DEFAULT UND msync@GLIBC_2.2.5
(2)
22: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
sigaction@GLIBC_2.2.5 (2)
23: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
mincore@GLIBC_2.2.5 (2)
24: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread9stop_wai
25: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread12prepare
26: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
__stack_chk_fail@GLIBC_2.4 (6)
27: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread4wakeEv
28: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread7currentE
29: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
munmap@GLIBC_2.2.5 (2)
30: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
sigemptyset@GLIBC_2.2.5 (2)
31: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread4waitEv
32: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread10stack_i
33: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
__gxx_personality_v0@CXXABI_1.3 (5)
34: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _Znwm@GLIBCXX_3.4
(3)
35: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_Unwind_Resume@GCC_3.0 (7)
36: 0000000000000000 0 FUNC GLOBAL DEFAULT UND mmap@GLIBC_2.2.5
(2)
37: 0000000000204018 0 NOTYPE GLOBAL DEFAULT 24 _end
38: 0000000000204010 0 NOTYPE GLOBAL DEFAULT 23 _edata
39: 0000000000204010 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
40: 0000000000001060 5640 FUNC GLOBAL DEFAULT 12 main
41: 0000000000000e70 0 FUNC GLOBAL DEFAULT 9 _init
42: 0000000000002d2c 0 FUNC GLOBAL DEFAULT 13 _fini
Symbol table '.symtab' contains 115 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000000001c8 0 SECTION LOCAL DEFAULT 1
2: 00000000000001f0 0 SECTION LOCAL DEFAULT 2
3: 0000000000000230 0 SECTION LOCAL DEFAULT 3
4: 0000000000000638 0 SECTION LOCAL DEFAULT 4
5: 0000000000000982 0 SECTION LOCAL DEFAULT 5
6: 00000000000009d8 0 SECTION LOCAL DEFAULT 6
7: 0000000000000a68 0 SECTION LOCAL DEFAULT 7
8: 0000000000000be8 0 SECTION LOCAL DEFAULT 8
9: 0000000000000e70 0 SECTION LOCAL DEFAULT 9
10: 0000000000000e90 0 SECTION LOCAL DEFAULT 10
11: 0000000000001050 0 SECTION LOCAL DEFAULT 11
12: 0000000000001060 0 SECTION LOCAL DEFAULT 12
13: 0000000000002d2c 0 SECTION LOCAL DEFAULT 13
14: 0000000000002d38 0 SECTION LOCAL DEFAULT 14
15: 0000000000003430 0 SECTION LOCAL DEFAULT 15
16: 0000000000003498 0 SECTION LOCAL DEFAULT 16
17: 00000000000036a4 0 SECTION LOCAL DEFAULT 17
18: 0000000000203ca0 0 SECTION LOCAL DEFAULT 18
19: 0000000000203cb0 0 SECTION LOCAL DEFAULT 19
20: 0000000000203cb8 0 SECTION LOCAL DEFAULT 20
21: 0000000000203cd8 0 SECTION LOCAL DEFAULT 21
22: 0000000000203ed8 0 SECTION LOCAL DEFAULT 22
23: 0000000000204000 0 SECTION LOCAL DEFAULT 23
24: 0000000000204010 0 SECTION LOCAL DEFAULT 24
25: 0000000000000000 0 SECTION LOCAL DEFAULT 25
26: 0000000000000000 0 SECTION LOCAL DEFAULT 26
27: 0000000000000000 0 SECTION LOCAL DEFAULT 27
28: 0000000000000000 0 SECTION LOCAL DEFAULT 28
29: 0000000000000000 0 SECTION LOCAL DEFAULT 29
30: 0000000000000000 0 SECTION LOCAL DEFAULT 30
31: 0000000000000000 0 SECTION LOCAL DEFAULT 31
32: 0000000000000000 0 SECTION LOCAL DEFAULT 32
33: 0000000000000000 0 FILE LOCAL DEFAULT ABS tst-mmap.cc
34: 0000000000002780 117 FUNC LOCAL DEFAULT 12
_ZL12segv_handleriP7sigin
35: 0000000000204012 1 OBJECT LOCAL DEFAULT 24
_ZL13segv_received
36: 00000000000033f0 13 OBJECT LOCAL DEFAULT 14
_ZZL12segv_handleriP7sigi
37: 0000000000002800 144 FUNC LOCAL DEFAULT 12 _ZL10catch_segvv
38: 0000000000003400 11 OBJECT LOCAL DEFAULT 14
_ZZL10catch_segvvE8__func
39: 0000000000002890 155 FUNC LOCAL DEFAULT 12 _ZL11caught_segvv
40: 00000000000033e0 12 OBJECT LOCAL DEFAULT 14
_ZZL11caught_segvvE8__fun
41: 0000000000002930 511 FUNC LOCAL DEFAULT 12
_ZNSt17_Function_handlerI
42: 00000000000033c0 14 OBJECT LOCAL DEFAULT 14
_ZZN5sched6thread13do_wai
43: 00000000000033d0 11 OBJECT LOCAL DEFAULT 14
_ZZZ4mainENKUlvE_clEvE8__
44: 00000000000033b0 14 OBJECT LOCAL DEFAULT 14
_ZZN5sched6thread13do_wai
45: 0000000000002b30 256 FUNC LOCAL DEFAULT 12
_ZNSt17_Function_handlerI
46: 00000000000033a0 14 OBJECT LOCAL DEFAULT 14
_ZZN5sched6thread13do_wai
47: 0000000000002c30 121 FUNC LOCAL DEFAULT 12
_ZNSt14_Function_base13_B
48: 0000000000203cb8 16 OBJECT LOCAL DEFAULT 20 _ZTIZ4mainEUlvE_
49: 0000000000002cb0 121 FUNC LOCAL DEFAULT 12
_ZNSt14_Function_base13_B
50: 0000000000203cc8 16 OBJECT LOCAL DEFAULT 20 _ZTIZ4mainEUlvE0_
51: 000000000000340b 5 OBJECT LOCAL DEFAULT 14
_ZZ4mainE8__func__
52: 0000000000002670 46 FUNC LOCAL DEFAULT 12
_GLOBAL__sub_I_tst_mmap.c
53: 0000000000204011 1 OBJECT LOCAL DEFAULT 24 _ZStL8__ioinit
54: 0000000000003410 14 OBJECT LOCAL DEFAULT 14 _ZTSZ4mainEUlvE_
55: 0000000000003420 15 OBJECT LOCAL DEFAULT 14 _ZTSZ4mainEUlvE0_
56: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
57: 00000000000026a0 0 FUNC LOCAL DEFAULT 12
deregister_tm_clones
58: 00000000000026e0 0 FUNC LOCAL DEFAULT 12
register_tm_clones
59: 0000000000002730 0 FUNC LOCAL DEFAULT 12
__do_global_dtors_aux
60: 0000000000204010 1 OBJECT LOCAL DEFAULT 24 completed.7696
61: 0000000000203cb0 0 OBJECT LOCAL DEFAULT 19
__do_global_dtors_aux_fin
62: 0000000000002770 0 FUNC LOCAL DEFAULT 12 frame_dummy
63: 0000000000203ca0 0 OBJECT LOCAL DEFAULT 18
__frame_dummy_init_array_
64: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
65: 00000000000036a0 0 OBJECT LOCAL DEFAULT 16 __FRAME_END__
66: 0000000000000000 0 FILE LOCAL DEFAULT ABS
67: 0000000000204008 8 OBJECT LOCAL DEFAULT 23
DW.ref.__gxx_personality_
68: 0000000000003430 0 NOTYPE LOCAL DEFAULT 15
__GNU_EH_FRAME_HDR
69: 0000000000203ed8 0 OBJECT LOCAL DEFAULT 22
_GLOBAL_OFFSET_TABLE_
70: 0000000000204010 0 OBJECT LOCAL DEFAULT 23 __TMC_END__
71: 0000000000204000 0 OBJECT LOCAL DEFAULT 23 __dso_handle
72: 0000000000203cd8 0 OBJECT LOCAL DEFAULT 21 _DYNAMIC
73: 0000000000000000 0 FUNC GLOBAL DEFAULT UND mprotect@
@GLIBC_2.2.5
74: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread5startEv
75: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZN5sched4cpusE
76: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
77: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _ZdlPv@
@GLIBCXX_3.4
78: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __assert_fail@
@GLIBC_2.2.
79: 0000000000002d2c 0 FUNC GLOBAL DEFAULT 13 _fini
80: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZNSt8ios_base4InitC1Ev@@
81: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND _ZSt4cerr@
@GLIBCXX_3.4
82: 0000000000000000 0 FUNC GLOBAL DEFAULT UND malloc@
@GLIBC_2.2.5
83: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __cxa_atexit@
@GLIBC_2.2.5
84: 0000000000000000 0 FUNC GLOBAL DEFAULT UND aligned_alloc@
@GLIBC_2.16
85: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZNSt8ios_base4InitD1Ev@@
86: 0000000000000000 0 NOTYPE WEAK DEFAULT UND
_ITM_deregisterTMCloneTab
87: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
_ZStlsISt11char_traitsIcE
88: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6threadC1ESt8fun
89: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free@@GLIBC_2.2.5
90: 0000000000000000 0 NOTYPE WEAK DEFAULT UND
_ITM_registerTMCloneTable
91: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched11preemptableEv
92: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@
@GLIBC_2.2
93: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND
_ZTVN10__cxxabiv117__clas
94: 0000000000000000 0 FUNC GLOBAL DEFAULT UND msync@
@GLIBC_2.2.5
95: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sigaction@
@GLIBC_2.2.5
96: 0000000000000000 0 FUNC GLOBAL DEFAULT UND mincore@
@GLIBC_2.2.5
97: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread9stop_wai
98: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread12prepare
99: 0000000000204010 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
100: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __stack_chk_fail@
@GLIBC_2
101: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread4wakeEv
102: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread7currentE
103: 0000000000000000 0 FUNC GLOBAL DEFAULT UND munmap@
@GLIBC_2.2.5
104: 0000000000204018 0 NOTYPE GLOBAL DEFAULT 24 _end
105: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sigemptyset@
@GLIBC_2.2.5
106: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread4waitEv
107: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND
_ZN5sched6thread10stack_i
108: 0000000000204010 0 NOTYPE GLOBAL DEFAULT 23 _edata
109: 0000000000000000 0 FUNC GLOBAL DEFAULT UND
__gxx_personality_v0@@cxx
110: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _Znwm@
@GLIBCXX_3.4
111: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume@
@GCC_3.0
112: 0000000000000000 0 FUNC GLOBAL DEFAULT UND mmap@@GLIBC_2.2.5
113: 0000000000001060 5640 FUNC GLOBAL DEFAULT 12 main
114: 0000000000000e70 0 FUNC GLOBAL DEFAULT 9 _init
```
Also I am using QEMU 2.11
…On Tue, May 8, 2018 at 4:13 AM, nyh ***@***.***> wrote:
I cannot reproduce this on my machine, with gcc 7.3.1 and without KVM (scripts/run.py
-p qemu -e tests/tst-mmap.so).
If the backtrace it to believed (one from GDB could have been clearer), it
seems the crash happened while loading the test - in the
program::init_library() function, so I doubt a bug inside tst-mmap is
responsible for this. If you can reproduce it, can you please run the
kernel with --verbose and see if you can see get more insites on which
library was being loaded while the bug happening.
Since init_library is a new function (with moved code), it is also
possible that this is a regression. If you can easily reproduce this bug,
can you please try it with a slightly version of OSv, before init_library()
was introduced (for the go support)?
By code inspection, I think I found a potential bug:
In the past, the init_library() code was run by program::get_library()
inside a SCOPE_LOCK(_mutex).
Today, it is run later in app.cc, but it is not run with this mutex
locked. Maybe we have a problem with init_library() code racing with other
threads also adding libraries or doing other ELF stuff? You can try adding
SCOPE_LOCK(_mutex) to the beginning of init_library() and seeing if it
helps (our mutex is recursive, so it's fine for init_library to take the
lock even if its caller has it).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#967 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFDSISYxAQfUMnxDkbjN6EVrhlmghK08ks5twVPBgaJpZM4T1_du>
.
|
I just checkout before init_library() method was added for Golang and still getting error - stack trace different but I think in the same place:
Wanted to add that I am doing it with KVM enabled (QEMU 2.11). Also from docker container running Ubuntu 18.04 |
On Tue, May 8, 2018 at 3:09 PM, WALDEMAR KOZACZUK ***@***.***> wrote:
Adding SCOPE_LOCK(_mutex) did not help.
...
# Tests with special compilation parameters needed...
$(out)/tests/tst-mmap.so: COMMON += -Wl,-z,now
$(out)/tests/tst-elf-permissions.so: COMMON += -Wl,-z,relro
# The following tests use special linker trickery which apprarently
# doesn't work as expected with GOLD linker, so we need to choose BFD.
# TODO: figure out why this workaround was needed (the reason may be
# different for each of these tests), and avoid this workaround!
$(out)/tests/tst-mmap.so: COMMON += -fuse-ld=bfd
$(out)/tests/tst-elf-permissions.so: COMMON += -fuse-ld=bfd
$(out)/tests/tst-tls.so: COMMON += -fuse-ld=bfd
The reason for this use-ld stuff is the issue
#891
But something strange... If we use for mmap "-z now", all the symbols
should have been relocated on load, *before* running the initialization
function.
Basically, the function object::relocate_pltgot() should have been called
(before library_init), and see bind_now==true, and call
arch_relocate_jump_slot(). So why was (apparently)
object::resolve_pltgot(unsigned index) called and called
arch_relocate_jump_slot() later? You can put debugging messages in these
functions to see in what order they are called...
I guess I'm missing something, but can't figure out what.
page fault outside application, addr: 0x0000100001603f10
[registers]
RIP: 0x000000000039769e <elf::object::arch_relocate_jump_slot(unsigned
int,
void*, long)+62>
I wonder if your compiler enables "-z relro" by default, maybe it's a
security hardening feature in your distro that's not in mine, which is why
I don't see this error?
In such a case, the relocations would be unwriteable after the load, which
explains why arch_relocate_jump_slot() crashes when trying to write the
relocation.
But the the question becomes, since we used -z now, why
object::relocate_pltgot() do all the relocations earlier, and why one of
them needed to happen during the init function's run.
By the way, do you see this crash every time? How did it start now - did
you previously use a different distro?
|
See my comments below.
On Tue, May 8, 2018 at 10:52 AM, nyh ***@***.***> wrote:
On Tue, May 8, 2018 at 3:09 PM, WALDEMAR KOZACZUK <
***@***.***>
wrote:
> Adding SCOPE_LOCK(_mutex) did not help.
...
> # Tests with special compilation parameters needed...
> $(out)/tests/tst-mmap.so: COMMON += -Wl,-z,now
> $(out)/tests/tst-elf-permissions.so: COMMON += -Wl,-z,relro
>
> # The following tests use special linker trickery which apprarently
> # doesn't work as expected with GOLD linker, so we need to choose BFD.
> # TODO: figure out why this workaround was needed (the reason may be
> # different for each of these tests), and avoid this workaround!
> $(out)/tests/tst-mmap.so: COMMON += -fuse-ld=bfd
> $(out)/tests/tst-elf-permissions.so: COMMON += -fuse-ld=bfd
> $(out)/tests/tst-tls.so: COMMON += -fuse-ld=bfd
>
The reason for this use-ld stuff is the issue
#891
But something strange... If we use for mmap "-z now", all the symbols
should have been relocated on load, *before* running the initialization
function.
Basically, the function object::relocate_pltgot() should have been called
(before library_init), and see bind_now==true, and call
arch_relocate_jump_slot(). So why was (apparently)
object::resolve_pltgot(unsigned index) called and called
arch_relocate_jump_slot() later? You can put debugging messages in these
functions to see in what order they are called...
I guess I'm missing something, but can't figure out what.
I will try to add more debug statements around places you mentioned.
page fault outside application, addr: 0x0000100001603f10
> [registers]
> RIP: 0x000000000039769e <elf::object::arch_relocate_jump_slot(unsigned
> int,
> void*, long)+62>
>
I wonder if your compiler enables "-z relro" by default, maybe it's a
security hardening feature in your distro that's not in mine, which is why
I don't see this error?
In such a case, the relocations would be unwriteable after the load, which
explains why arch_relocate_jump_slot() crashes when trying to write the
relocation.
But the the question becomes, since we used -z now, why
object::relocate_pltgot() do all the relocations earlier, and why one of
them needed to happen during the init function's run.
By the way, do you see this crash every time? How did it start now - did
you previously use a different distro?
Test fails every single time (so far at least) - I tried around 5 times. I
even created separate osv build directory to make sure
I build from scratch and same test fails there as well.
Yes this is unique to the new setup I am using to test
building/testing/publishing OSv artifacts
from Docker container. The docker container uses Ubuntu 18.04 (very fresh
distribution from last month)
that included GCC 7 with QEMU/KVM 2.11.
I still have my regular setup (non-Docker) that uses older Ubuntu 16.04
with GCC 5 and QEMU/KVM 2.5 where this issue
does NOT manifest itself with the same OSv code.
And yet only tst-mmap test fails. All other test work just fine.
…
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#967 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFDSIQ-xQ0fbih-w5ejG1mmfUIfpKux_ks5twbEngaJpZM4T1_du>
.
|
Some extra debug: Success with tst-mmap-file.so:
failure with tst-mmap.so
Another success with similar linker settings:
Note that tst-mmap.so loads one extra library - libgcc_s.so.1 Does it potentially have anything to do with #687 and the fact that we still take libgcc_s.so.1 from external? |
Success with tst-mmap-file.so:
Calling load_object for libvdso.so
---> relocate_pltgot()
Calling init_library for libvdso.so
In init_library
Calling load_object for /tests/tst-mmap-file.so
Calling load_object for libc.so.6
---> relocate_pltgot()
Before callin init_library from main() for /tests/tst-mmap-file.so
In init_library
---> resolve_pltgot()
--> arch_relocate_jump_slot: b
This is expected - this test is not compiled with "-z now " so we expect
relocate_pltgot() to do little, and init_library() to cause
resolve_pltgot() on various functions.
failure with tst-mmap.so
Calling load_object for libvdso.so
---> relocate_pltgot()
Calling init_library for libvdso.so
This is what I expect - relocate_pltgot() should be called for a library
*before* calling init_library() for it. This is important in case the
library was compiled with "-z now".
In init_library
Calling load_object for /tests/tst-mmap.so
Calling load_object for libstdc++.so.6
By the way, libstdc++.so.6 and libc.so.6 should not be in the image - the
"load_object" for them should silently fail.
Calling load_object for libgcc_s.so.1
Calling load_object for libc.so.6
---> relocate_pltgot()
Calling load_object for libc.so.6
---> relocate_pltgot()
The two relocate_pltgot() calls should be for /tests/tst-map.so and
libgcc_s.so.1, I'm not sure why it's printed in this order.
The question here is, why didn't relocate_pltgot() call
arch_relocate_jump_slot and print those messages?
It should have, because of the linking with "-z now". relocate_pltgot()
should have seen bind_now == true.
Did it? If it didn't, this can explin the bug (the early symbol resolution
didn't take place, and the late one
can't because the got was made read-only).
Before callin init_library from main() for /tests/tst-mmap.so
In init_library
---> resolve_pltgot()
--> arch_relocate_jump_slot: 7
page fault outside application, addr: 0x0000100001603f10
Another success with similar linker settings:
Calling load_object for libvdso.so
---> relocate_pltgot()
Calling init_library for libvdso.so
In init_library
Calling load_object for /tests/tst-elf-permissions.so
Calling load_object for libstdc++.so.6
Calling load_object for libc.so.6
---> relocate_pltgot()
Before callin init_library from main() for /tests/tst-elf-permissions.so
In init_library
---> resolve_pltgot()
--> arch_relocate_jump_slot: 4
I don't see "-z now" in tst-elf-permissions's linking, just "-z relro". So
it's different. It didn't get so-called "full relro" and an opportunity for
this bug,
and the late resolutions make sense.
Another peculiar difference is that libgcc_s.so is not loader in this case.
But I don't see how this difference can be relevant.
---> resolve_pltgot()
--> arch_relocate_jump_slot: 6
---> resolve_pltgot()
--> arch_relocate_jump_slot: 9
Running elf segments permissions tests
---> resolve_pltgot()
--> arch_relocate_jump_slot: e
---> resolve_pltgot()
--> arch_relocate_jump_slot: c
---> resolve_pltgot()
--> arch_relocate_jump_slot: 1
elf segments permissions tests succeeded
Note that tst-mmap.so loads two extra libraries - libstdc++.so.6 and
libgcc_s.so.1
libstdc++.so.6 should not exist. libgcc_s.so.1 does (see commit
be56532) but I can't see how it might be
relevant to this problem.
|
Extra detailed debug for failed scenario: Calling load_object for libvdso.so ... and relocate() later I noticed (maybe wrong) - why bind_now is 0 (false)? When I run Things look correct. Could it be that our logic in loading/initializing ELF relies on certain order of information and it this case (the way things are linked in tst-mmap.so) is different? |
Yes, I believe the fact that bind_now==0 for tst-mmap.so is the underlying cause of the bug (together bind_now is set as: bool bind_now = dynamic_exists(DT_BIND_NOW); This does NOT test the "BIND_NOW" flag you noticed, but rather a BIND_NOW section: On my machine (where this bug doesn't happen), I get this:
It seems in your machine, this section is missing, and instead you have a flag. I'll investigate and prepare a patch for you to try. |
Hi @wkozaczuk thanks for all the useful debugging information. I just sent a patch to the mailing list hopefully fixing this bug. Can you please test it? Thanks. |
Yep that patch works. I tested with my old non-Docker configuration (gcc
5.4 and ld 2.26.1) and Docker one (gcc 7.3 and ld 2.30).
Do you think reason I saw this bug and you did not not with same gcc was
because of different linker version?
Also does it mean that linker version is typically independent from gcc
version?
…On Wed, May 9, 2018 at 9:06 AM, nyh ***@***.***> wrote:
Hi @wkozaczuk <https://github.com/wkozaczuk> thanks for all the useful
debugging information. I just sent a patch to the mailing list hopefully
fixing this bug. Can you please test it? Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#967 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFDSIc8uF987jzQt8aV9u688sC5f9bOdks5twunDgaJpZM4T1_du>
.
|
Excellent, thanks. |
I was curious and looked at ld's code (GNU binutils, bfd/elflink.c). Indeed, If ld is run with "--enable-new-dtags", then a DT_FLAGS entry is created, and DT_BIND_NOW is not created - as in your machine. |
Indeed I have ld 2.30 in my Docker container.
Thanks for providing the patch.
…On Wed, May 9, 2018 at 12:20 PM, nyh ***@***.***> wrote:
Closed #967 <#967> via
cc6e8f5
<cc6e8f5>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#967 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFDSIUv-AB1GrAxQP2SRazYCPy19El7Dks5twxdNgaJpZM4T1_du>
.
|
The error is repeatable. Also it is running under Docker container.
TEST tst-mmap.so OSv
eth0: 192.168.122.15
page fault outside application, addr: 0x0000100001603f10
[registers]
RIP: 0x0000000000397822 <elf::object::arch_relocate_jump_slot(unsigned int, void*, long)+34>
RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0x00000000004b4110 RBX: 0x0000100001603f10 RCX: 0x000000000098d9d8 RDX: 0x0000000000000002
RSI: 0xffff80000250e680 RDI: 0x00002000001ff9b0 RBP: 0x00002000001ff9d0 R8: 0xfffffffffffff8e0
R9: 0x0000000000c70598 R10: 0x0000000000000007 R11: 0x0000000000000009 R12: 0xffff900001444010
R13: 0x0000000000c70598 R14: 0x0000000000000007 R15: 0x0000000000000009 RSP: 0x00002000001ff9b0
Aborted
[backtrace]
0x0000000000330ff5 <???+3346421>
0x000000000033374a <mmu::vm_fault(unsigned long, exception_frame*)+234>
0x000000000039365e <page_fault+142>
0x00000000003924f6 <???+3745014>
0x00000000003475cc <elf::object::resolve_pltgot(unsigned int)+332>
0x00000000003476e6 <elf_resolve_pltgot+70>
0x000000000039212a <???+3744042>
0x00002000001ffecf <???+2096847>
0x0000000000344212 <elf::program::init_library(int, char**)+386>
0x000000000020dc7a osv::application::main()+58
0x0000000000423798 <???+4339608>
0x00000000004562e5 <???+4547301>
0x00000000003f0ba6 <thread_main_c+38>
0x0000000000393472 <???+3748978>
Test tst-mmap.so FAILED
Traceback (most recent call last):
File "./scripts/test.py", line 188, in
main()
File "./scripts/test.py", line 175, in main
run_tests()
File "./scripts/test.py", line 166, in run_tests
run(tests_to_run)
File "./scripts/test.py", line 86, in run
run_test(test)
File "./scripts/test.py", line 61, in run_test
test.run()
File "/git-repos/osv/scripts/tests/testing.py", line 29, in run
run_command_in_guest(self.command).join()
File "/git-repos/osv/scripts/tests/testing.py", line 163, in join
raise Exception('Guest failed')
Exception: Guest failed
Here is the exact gcc version:
gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.
I wonder if this has anything to do with the compilation warnings:
CXX tests/tst-mmap-file.cc
CXX tests/tst-mmap.cc
LD tests/tst-mmap-file.so
In file included from /git-repos/osv/tests/tst-mmap.cc:11:0:
/git-repos/osv/tests/tst-mmap.hh: In function 'bool try_write(int (*)())':
/git-repos/osv/tests/tst-mmap.hh:76:37: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
char byte = (volatile char)&func;
^~~~
/git-repos/osv/tests/tst-mmap.hh:77:25: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
(volatile char)&func = byte;
^~~~
LD tests/tst-mmap.so
The text was updated successfully, but these errors were encountered: