Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible libstdc++ incompatibility on Linux #388

Closed
gqmelo opened this issue May 6, 2017 · 7 comments
Closed

Possible libstdc++ incompatibility on Linux #388

gqmelo opened this issue May 6, 2017 · 7 comments

Comments

@gqmelo
Copy link
Contributor

gqmelo commented May 6, 2017

Recently at work I faced an issue with libstdc++ binary compatibility on Linux and I would like to share as I believe the current toolchain used by conda-forge could bring the same problem or at least some other people trying to build portable binaries may find this useful.

The problem

It was identified when running on CentOS 7 an OpenGL application built under CentOS 6.

These are the relevant characteristics:

  • application was built with gcc 5.4.0 on CentOS 6 using -D_GLIBCXX_USE_CXX11_ABI=0
  • it ships 5.4.0's libstdc++.so.6, but it's only loaded if the system library is older
  • when running on CentOS 7, mesa llvmpipe driver was being used.
  • mesa uses dlopen to load llvmpipe driver
  • the llvmpipe driver links to llvm
  • llvm is statically linked to libstdc++ (from gcc 4.8.5)

This combination resulted on a free(): invalid pointer crash as described on this bug report

The cause of this is that the static member std::string::_Rep::_S_empty_rep_storage is defined both on llvm and the libstdc++ shipped by the application and therefore there are two different objects on memory. _S_empty_rep_storage is used to represent an empty string, so when trying to dispose a string the library checks if the string is a _S_empty_rep_storage. But in this case the empty string was the _S_empty_rep_storage defined by llvm, while the code was comparing it to the _S_empty_rep_storage defined on the shipped libstdc++ resulting on a static member being freed.

STB_GNU_UNIQUE

To solve exactly this problem STB_GNU_UNIQUE symbol was invented. Better looking into the symbols of the libraries involved shows that this is really the problem:

$ objdump -C -T /usr/lib64/libLLVM-3.8-mesa.so | grep _S_empty_rep_storage
000000000405be20 u    DO .bss   0000000000000020  Base std::string::_Rep::_S_empty_rep_storage
000000000405bde0 u    DO .bss   0000000000000020  Base std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_Rep::_S_empty_rep_storage

The llvm library is defining the stdlib static variables as unique, but the shipped libstdc++.so is not:

$ objdump -C -T $LIBSTDCXX5 | grep _S_empty_rep_storage
000000000038c300 g    DO .bss   0000000000000020  GLIBCXX_3.4 std::string::_Rep::_S_empty_rep_storage
000000000038c320 g    DO .bss   0000000000000020  GLIBCXX_3.4 std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_Rep::_S_empty_rep_storage

That is why there were two different _S_empty_rep_storage on memory. The solution was just to recompile gcc/libstdc++ 5.4.0 making sure the static symbols are defined as unique. This implied in using a more recent binutils when building gcc.

conda-forge

I'm still not sure if this affects binaries built by conda-forge, but AFAIK the devtoolset-2 used on the docker image uses the default /usr/lib64/libstdc++.so (gcc 4.4) plus the newer symbols present on 4.8.5, statically linking them.

/usr/lib64/libstdc++.so does not define static variables as unique, so I see two cases where the same problem could happen building something on conda-forge:

  • someone build a binary using gcc flag -static-libstdc++, which will then statically link the symbols present on /usr/lib64/libstdc++.so
  • the produced binary includes some new static variable present on 4.8.5 as non-unique.

The first situation should be rare, but it is definitely something we should be more careful about. The second situation I intend to test better when I get some spare time.

References

https://bugzilla.redhat.com/show_bug.cgi?id=1417663
https://gcc.gnu.org/ml/gcc-help/2017-04/msg00062.html
https://gcc.gnu.org/ml/gcc-help/2017-05/msg00011.html
https://www.redhat.com/archives/posix-c++-wg/2009-August/msg00002.html

@gqmelo
Copy link
Contributor Author

gqmelo commented May 6, 2017

cc: @jakirkham

@jakirkham
Copy link
Member

Thanks @gqmelo. I think I missed some of the details in our chat earlier. So it is nice to seem them laid out here. When building with gcc 5.x, did you set -D_GLIBCXX_USE_CXX11_ABI=0? If not, I would be curious to know if you still encounter problems with that change.

@gqmelo
Copy link
Contributor Author

gqmelo commented May 7, 2017

When building with gcc 5.x, did you set -D_GLIBCXX_USE_CXX11_ABI=0

I forgot this very important detail. Yeah, the code compiled with gcc 5.x used -D_GLIBCXX_USE_CXX11_ABI=0. Actually when building gcc I set the old ABI as default. You can see the recipe here: https://github.com/gqmelo/gcc-recipes . It is not the same recipe we are using at work, but it's almost the same.

But anyway, this actually does not matter, as it is not my C++ code that is triggering the crash. It's mesa's and llvm's. Actually I can reproduce the crash with a pure C program LD_PRELOAD'ing libstdc++.so, as described on the bug report.

Maybe a diagram makes it easier to understand:

                 OpenGL C API
 +------------+            +--------+        +---------------+      +----------------------------------+
 |program in C| +--------> |libGL.so| +----> |llvmpipe driver| +--> |libLLVM-3.8.1-mesa.so             |
 +------------+            +--------+        |(swrast.so)    |      |(w/ statically linked c++ symbols)|
                                             +----------+--+-+      +-+--+--+------+-------------------+
                                                        |  |          ^  |  |      ^
                                                        |  |          |  |  |      |
+---------------+               C++ API calls           |  +----------+  |  +------+
|libstdc++.so.6 | <-------------------------------------+  C++ API calls |   C++ API calls
|(LD_PRELOAD'ed)|                                                        |
+---------------+ <------------------------------------------------------+
                                C++ API calls

What's important on the diagram above is that calls to C++ API can be executed either on libstdc++.so or libLLVM-3.8.1-mesa.so (which was statically linked to libstdc++.so). In the case of this crash specifically, an empty std::string was initialized by the std::string constructor present on libLLVM-3.8.1-mesa.so. But at some point, when trying to assign the same string to a different value, the code executed was on libstdc++.so.6, which contains a different reference of the static member _S_empty_rep_storage, hence failing to detect the string is a reference to the empty string representation.

Looking into libstdc++ source code, _S_empty_rep_storage is really only used by the old ABI, which means that if mesa and llvm were compiled against the new ABI this specific problem would not happen, as every empty string would be a different instance. But there are other static variables across the stdlib which could cause similar random problems.

@sagara28
Copy link

sagara28 commented Dec 19, 2018

@gqmelo
I am also seeing similar issue in which if user set LD_PRELOAD to library which has dynamically link to libstdc++.so having “_S_empty_rep_storage” symbol scope as “u” and our executable(nvprof) which is compiled and statically linked with libstdc++.so having “_S_empty_rep_storage” symbol scope as “w”. I am getting crash. Note that if I build my executable with libstdc++.so which has “_S_empty_rep_storage” scope as “u” then I don’t see the issue or if I use same libstdc++.so but do not link it statically.
And I also don’t see issue if user doesn’t set LD_PRELOAD.

Hence, as suggested in your post I tried to build libstdc++ with newer binutils 2.31.1 https://ftp.gnu.org/gnu/binutils/binutils-2.31.tar.bz2 but still the cope of “_S_empty_rep_storage” remains as “w”

Can you tell me with which binutils you are compiling libstdc++.? Or can you suggest anything else to resolve my issue ?

FYI, I am using gcc-4.7.3 and building it against glibc-2.2.5. Attached bash script explain how I am compiling gcc

#0 0x00007ffff7611428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff761302a in __GI_abort () at abort.c:89
#2 0x00007ffff76537ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff776ced8 "*** Error in `%s': %s: 0x%s *\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff765c37a in malloc_printerr (ar_ptr=, ptr=, str=0x7ffff7769caf "free(): invalid pointer", action=3) at malloc.c:5006
#4 _int_free (av=, p=, have_lock=0) at malloc.c:3867
#5 0x00007ffff766053c in __GI___libc_free (mem=) at malloc.c:2968
#6 0x000000000040d6f2 in std::locale::_Impl::_M_install_facet(std::locale::id const
, std::locale::facet const
) ()
#7 0x0000000000423fe3 in std::locale::_Impl::_Impl(unsigned long) ()
#8 0x0000000000424f55 in std::locale::_S_initialize_once() ()
#9 0x00007ffff664fa99 in __pthread_once_slow (once_control=0x691248 std::locale::_S_once, init_routine=0x424f40 std::locale::_S_initialize_once()) at pthread_once.c:116
#10 0x0000000000424fa1 in std::locale::_S_initialize() ()
#11 0x0000000000425022 in std::locale::locale() ()
#12 0x00000000004091ac in std::ios_base::Init::Init() ()
#13 0x00000000004090be in __static_initialization_and_destruction_0(int, int) ()
#14 0x00000000004090e7 in _GLOBAL__sub_I_main ()
#15 0x000000000045cccd in __libc_csu_init ()
#16 0x00007ffff75fc7bf in __libc_start_main (main=0x40908c

, argc=1, argv=0x7fffffffe428, init=0x45cc70 <__libc_csu_init>, fini=, rtld_fini=, stack_end=0x7fffffffe418)
at ../csu/libc-start.c:247
#17 0x0000000000408fa9 in _start ()

build (2).txt

@scopatz
Copy link
Member

scopatz commented Dec 19, 2018

My general understanding is that conda cannot guarantee ABI compatibility if users set LD_* variables

@gqmelo
Copy link
Contributor Author

gqmelo commented Dec 25, 2018

I replied by email as that doesn't seem to be directly related to conda-forge, but just to reinforce here that static linking to libstdc++ is a bad idea.

The bug I reported about two years ago made them stop building Mesa with static libstdc++.

And as @scopatz said, if the end user is using LD_* variables there is no way to guarantee that it won't break.
For example, the user can replace libstdc++ with a very old version (missing a lot of symbols).

@jakirkham
Copy link
Member

Think this was resolved a long time ago by moving to Conda compilers. So going to close this out. Though please let us know if that is not the case and can reopen then. Thanks all! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants