Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV in __gnu_Unwind_Resume during exception propagation when ASan is used #613

Closed
dxin2015 opened this issue Jan 8, 2018 · 2 comments
Closed
Assignees

Comments

@dxin2015
Copy link

dxin2015 commented Jan 8, 2018

I have a libA.so and libA_wrapper.so
A exception is thrown in libA and is caught in libA_wrapper
Depending on where in libA this exception is thrown, SEGV may occur in ASan's internal exception handler.
In this case, the PC is libclang_rt.asan-arm-android.so+0xa7a77,
which is 000a7a70 <__gnu_Unwind_Resume>+0x7 (libclang_rt.asan-arm-android.so is pushed to the device by /ndk-bundle/toolchains/llvm/prebuilt/linux-x86_64/bin/asan_device_setup).

Initially I thought this was caused by #289 ,
but I checked my libA and libA_wrapper and I only see :
arm-linux-androideabi-readelf -sW libA.so | grep _Unwind

41: 00000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume
10477: 00000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume

even if I added "unwind" (libunwind,a) as the first library in target_link_libraries().

Then I thought this is an issue caused by passing exceptions across shared library boundary, so I changed libA to a static library and at run time there will only be libA_wrapper.so that contains both, but I will still get the same result.

Log from the application:

01-08 16:03:08.747 10180 10301 I : =================================================================
01-08 16:03:08.748 10180 10301 I :
01-08 16:03:08.748 10180 10301 I :
01-08 16:03:08.748 10180 10301 I : ==10180==ERROR: AddressSanitizer: SEGV on unknown address 0x0000000e (pc 0xa68e5a78 bp 0x987febd8 sp 0x987fc190 T543)
01-08 16:03:08.748 10180 10301 I :
01-08 16:03:08.748 10180 10301 I :
01-08 16:03:08.748 10180 10301 I : ==10180==The signal is caused by a READ memory access.
01-08 16:03:08.748 10180 10301 I :
01-08 16:03:08.748 10180 10301 I : ==10180==Hint: address points to the zero page.
01-08 16:03:08.748 10180 10301 I :
01-08 16:03:08.920 10180 10301 I : #0 0xa68e5a77 (/system/lib/libclang_rt.asan-arm-android.so+0xa7a77)
01-08 16:03:08.920 10180 10301 I :
01-08 16:03:08.921 10180 10301 I :
01-08 16:03:08.921 10180 10301 I :
01-08 16:03:08.922 10180 10301 I : AddressSanitizer can not provide additional info.

I'm positive that this segv does not come from user code, which is a early version of this:
https://github.com/glassechidna/zxing-cpp/blob/07e5600e56e5b9e3a5a78ccaea52fb4daf1c70ea/core/src/zxing/MultiFormatReader.cpp#L112

I tried to throw a dummy exception at the beginning of
Ref MultiFormatReader::decodeInternal(Ref image)
and I can not see the logs from Ref image's destructor, so the crash must have happened during stack unwinding.

This error does not happen when ASan is not installed.

Environment Details

Not all of these will be relevant to every bug, but please provide as much
information as you can.

  • NDK Version: 16.1.4479499
  • Build sytem: cmake triggered by gradle, gradle=4.1, gradle plugin=3.0.0, cmake=3.6.4111459
  • Host OS: Ubuntu 16.04
  • Compiler: clang++
  • ABI: armeabi-v7a
  • STL: libc++_shared
  • NDK API level: 26
  • Device API level: 25
@AstralStorm
Copy link

AstralStorm commented Feb 14, 2018

I've hit this identical issue in my own code. It seems that asan in these NDK is built with older libc++ exception code relying on libunwind while new libc++_shared exception unwinding code uses same method as gnustl.

This makes the unwinder explode whenever it is triggered and asan is enabled.

@rprichard rprichard self-assigned this Feb 15, 2018
@rprichard
Copy link
Collaborator

TLDR: This is a bug with ASAN in r16b that should be fixed in r17.

Initially I thought this was caused by #289 ,
but I checked my libA and libA_wrapper and I only see :
arm-linux-androideabi-readelf -sW libA.so | grep _Unwind

41: 00000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume
10477: 00000000 0 FUNC GLOBAL DEFAULT UND _Unwind_Resume

Yeah, this is a problem. When building for ARM32, using the libc++ STL, application binaries need to use their own statically-linked copy of LLVM's libunwind. If the _Unwind_Resume symbol is undefined, the runtime linker will search for one somewhere. It's likely to find _Unwind_Resume in /system/lib/libc.so, which is (if I understand correctly) the incompatible libgcc unwinder. libc.so exposes _Unwind_Resume, __gnu_Unwind_Resume, etc.

The root problem is that the libclang_rt.asan-arm-android.so bundled with the NDK exposes libgcc's unwinder:

$ readelf -W --dyn-syms ~/android-ndk-r16b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/5.0/lib/linux/libclang_rt.asan-arm-android.so | grep '_Unwind_Resume\b'
   692: 000a7a70   116 FUNC    GLOBAL DEFAULT   11 __gnu_Unwind_Resume
   792: 000a8570    36 FUNC    GLOBAL DEFAULT   11 _Unwind_Resume
  1399: 000a8570    36 FUNC    GLOBAL DEFAULT   11 ___Unwind_Resume

The NDK binaries are linked with LLVM's libunwind.a, but libclang_rt.asan-arm-android.so appears earlier on the linker command-line, so its definition is preferred. The app binary then uses _Unwind_Resume from either libc.so or libclang_rt.asan-arm-android.so and probably finds the incompatible libgcc one.

libclang_rt.asan-arm-android.so seems to have been fixed between ab/4053586 and ab/4393122. In ab/4393122, there is a script lib/sanitizer_common/scripts/gen_dynamic_list.py that Modules/SanitizerUtils.cmake uses to generate a version script according to a whitelist. ab/4053586, OTOH, appears to build LLVM using Soong, and I don't see a reference to gen_dynamic_list.py (or --exclude-libs, another way to hide the libgcc unwinder.)

The unwinder symbols are hidden in r17:

$ readelf -W --dyn-syms /ssd/ndk-release-r17/out/android-ndk-r17-beta1/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/5.0/lib/linux/libclang_rt.asan-arm-android.so | grep '_Unwind_Resume\b'
<nothing>

The test case in #615 passes with r17, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants