Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix static destruction ordering issue on macOS #16

Conversation

meyerj
Copy link
Contributor

@meyerj meyerj commented Jul 27, 2018

This is a follow-up of #10 (fix: on destruction OutputHandlerROS should restore the previous handler).

Problem Description

While for Linux the proposed fix works nicely and avoids the pure virtual method call demonstrated by the cleanup unit test, on macOS the added call to console_bridge::restorePreviousOutputHandler() in the destructor of OutputHandlerROS triggers an exception because this mutex in console_bridge's static DefaultHandler DOH struct has already been destructed when the call to console_bridge::restorePreviousOutputHandler() tries to lock it during the static destruction of RegisterOutputHandlerProxy.

libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
Process 13839 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x00007fffb664fd42 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
->  0x7fffb664fd42 <+10>: jae    0x7fffb664fd4c            ; <+20>
    0x7fffb664fd44 <+12>: movq   %rax, %rdi
    0x7fffb664fd47 <+15>: jmp    0x7fffb6648caf            ; cerror_nocancel
    0x7fffb664fd4c <+20>: retq
Target 0: (rviz) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fffb664fd42 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fffb673d457 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fffb65b5420 libsystem_c.dylib`abort + 129
    frame #3: 0x00007fffb510994a libc++abi.dylib`abort_message + 266
    frame #4: 0x00007fffb512ec17 libc++abi.dylib`default_terminate_handler() + 243
    frame #5: 0x00007fffb5c3d713 libobjc.A.dylib`_objc_terminate() + 124
    frame #6: 0x00007fffb512bd49 libc++abi.dylib`std::__terminate(void (*)()) + 8
    frame #7: 0x00007fffb512bdc3 libc++abi.dylib`std::terminate() + 51
    frame #8: 0x000000010229a11b librosconsole_bridge.dylib`__clang_call_terminate + 11
    frame #9: 0x000000010229a13d librosconsole_bridge.dylib`rosconsole_bridge::OutputHandlerROS::~OutputHandlerROS() + 29
    frame #10: 0x00007fffb65b6178 libsystem_c.dylib`__cxa_finalize_ranges + 332
    frame #11: 0x00007fffb65b64b2 libsystem_c.dylib`exit + 55
    frame #12: 0x00007fffb652123c libdyld.dylib`start + 8
(lldb)

(after quitting rviz with ROS kinetic built from source on macOS Sierra 10.12, with system dependencies installed from HomeBrew, with console_bridge 0.4.0 and rosconsole_bridge version 0.5.1)

I could not reproduce this problem for "normal" executables that only link statically to dynamic libraries, but the exception is triggered consistently if the application loaded libraries dynamically at run-time (dlopen(), e.g. through pluginlib) that directly or indirectly link to librosconsole_bridge.dylib.

The root cause is likely the different ordering of Linux and Darwin during the static initialization and destruction phase. While Linux initializes all global symbols right after a dynamic library has been loaded, Darwin only initializes them on first use. Apparently that is what is called "deferred dynamic initialization" in [1], which is allowed by the C++ standard [2]. The destruction happens in strictly reverse order of initialization in both cases, where the completion of the constructor or dynamic initialization is the synchronization point [3].

If the construction of OutputHandlerROS called during static initialization of oh_ros from the constructor of RegisterOutputHandlerProxy finishes before the first call to getDOH(), which in turn initializes the mutex as part of the static initialization of DefaultOutputHandler DOH, the mutex also gets destroyed before the destruction of oh_ros, which calls into console_bridge::restorePreviousOutputHandler and tries to lock it.

Proposed patch

The solution is to make sure that the output handler is restored during the destruction of exactly the same object that installed a new one, namely RegisterOutputHandlerProxy. If it's not the constructor of OutputHandlerROS that calls console_bridge::useOutputHandler(this), it should also not be the destructors responsibility to call console_bridge::restorePreviousOutputHandler().

Note that there might be more than one instance of RegisterOutputHandlerProxy in a single process because it can be instantiated once for every compilation unit that included the REGISTER_ROSCONSOLE_BRIDGE macro that defines the symbol rosconsole_bridge::RegisterOutputHandlerProxy __register_rosconsole_output_handler_proxy, depending on whether it is loaded with RTLD_GLOBAL or not, and they refer to either the same or a different instance of static OutputHandlerROS oh_ros.

ABI compatibility (tested on Xenial only)

I think the same reasoning about ABI-compatibility as in #10 (comment) applies to this patch, too:

$ abi-compliance-checker -l rosconsole_bridge -old rosconsole_bridge-0.5.1.dump -new rosconsole_bridge-fix.dump 
preparation, please wait ...
WARNING: version number #1 is not set (use --v1=NUM option)
WARNING: version number #2 is not set (use --v2=NUM option)
comparing ABIs ...
comparing APIs ...
creating compatibility report ...
result: COMPATIBLE
total "Binary" compatibility problems: 0, warnings: 0
total "Source" compatibility problems: 0, warnings: 0
see detailed report:
  compat_reports/rosconsole_bridge/X_to_Y/compat_report.html

But as the definition of RegisterOutputHandlerProxy has been changed, which is actually instantiated in another compilation unit (e.g. liburdf.so), we also have to test this one:

$ abi-compliance-checker -l urdf -old urdf-0.5.1.dump -new urdf-fix.dump 
preparation, please wait ...
WARNING: version number #1 is not set (use --v1=NUM option)
WARNING: version number #2 is not set (use --v2=NUM option)
comparing ABIs ...
comparing APIs ...
creating compatibility report ...
result: COMPATIBLE
total "Binary" compatibility problems: 0, warnings: 0
total "Source" compatibility problems: 0, warnings: 0
see detailed report:
  compat_reports/urdf/X_to_Y/compat_report.html

That being said, even if ABI compatible, if only rosconsole_bridge would be recompiled/upgraded, but not the compilation unit that expands the REGISTER_ROSCONSOLE_BRIDGE macro and instantiates RegisterOutputHandlerProxy, none of the two destructors would be called which might result in the pure virtual method call during static destruction again.

Unit test

The add_test() call in test/CMakeLists.txt:4 was missing a NAME keyword, which is why it always failed for me when running through the target test generated by cmake, because it tried to execute a command called COMMAND. Also note that the unit test is not using catkin's test infrastructure which defines the targets tests and run_tests and only provides macros for nosetests and gtest. It will not run automatically on make run_tests or on the ROS build farm. In order to fix that catkin should provide a macro like catkin_add_test() that is not specific to a certain testing library and can execute arbitrary commands, but which integrates with its run_tests target.

References

[1] https://en.cppreference.com/w/cpp/language/initialization#Deferred_dynamic_initialization
[2] http://eel.is/c++draft/basic.start.dynamic#4
[3] http://eel.is/c++draft/basic.start#term-3

@dirk-thomas dirk-thomas added the bug label Aug 3, 2018
@dirk-thomas
Copy link
Member

Thank you for the detailed description. I completely agree with the rational to move the restoring into the destructor of RegisterOutputHandlerProxy. 👍

In order to fix that catkin should provide a macro like catkin_add_test() that is not specific to a certain testing library and can execute arbitrary commands, but which integrates with its run_tests target.

Does catkin_run_tests_target satisfy this? That function is being used when registering a gtest or nosetests test.

@dirk-thomas dirk-thomas merged commit a0d3831 into ros:kinetic-devel Aug 3, 2018
@meyerj
Copy link
Contributor Author

meyerj commented Aug 4, 2018

In order to fix that catkin should provide a macro like catkin_add_test() that is not specific to a certain testing library and can execute arbitrary commands, but which integrates with its run_tests target.

Does catkin_run_tests_target satisfy this? That function is being used when registering a gtest or nosetests test.

In principle yes, but it would still require some glue like in catkin/gtest.cmake:81-89 that has to be repeated for each test executable and that could be wrapped in its own catkin function:

function(catkin_add_test name_or_target
  cmake_parse_arguments(_testing "" "" "COMMAND" ${ARGN})
  set(target)
  set(name "${name_or_target}")
  # unparsed arguments are forwarded to catkin_run_tests_target()

  # for compatibility with add_test(), which accepts a target name after COMMAND
  if(_testing_COMMAND AND TARGET "${_testing_COMMAND}")
    set(target "${_testing_COMMAND}")
    set(_testing_COMMAND)
  elseif(TARGET "${name_or_target}")
    set(target "${name_or_target}")
  endif()

  if(TARGET ${target})
    # make sure the target is built before running tests
    add_dependencies(tests ${target})

    # get command from executable output path
    if(NOT _testing_COMMAND)
      # get_target_property() may be replaced by a generator expression in modern CMake?
      get_target_property(_target_path ${target} RUNTIME_OUTPUT_DIRECTORY)
      get_target_property(_target_executable ${target} RUNTIME_OUTPUT_NAME)
      set(_testing_COMMAND "${_target_path}/${_target_executable }")
    endif()
  endif()

  # The test executable does not produce a junit result file plain-${name}.xml, but
  # we have to add one in the catkin_run_tests_target() call...
  catkin_run_tests_target("plain" ${name} "plain-${name}.xml"
      COMMAND "${_testing_COMMAND}"
      DEPENDENCIES ${target}
      ${_testing_UNPARSED_ARGUMENTS})
endfunction()

# in test/CMakeLists.txt
add_executable(cleanup cleanup.cpp)
catkin_add_test(cleanup)

This function would create a custom run_tests_${PROJECT_NAME}_plain_cleanup target that runs the test executable through the run_tests.py wrapper and that gets added to the run_tests target, but because it does not produce a junit test result file on its own, the wrapper returns with a failure. I guess what would be needed is another wrapper (or a patch to run_tests.py) that generates a simple result file for plain tests which only records the exit code, without tracking individual test cases, instead of checking that the test executable created a result file.

An alternative would be to make xunit_filename an optional argument to catkin_run_tests_target() and call the executable directly if it is not specified. But I am not sure how this would interact with catkin_test_results and report generation at the build farm.

@meyerj meyerj deleted the fix-static-destruction-ordering-issue-on-macos branch August 4, 2018 00:20
@k-okada
Copy link

k-okada commented Aug 6, 2018

just for note, and I do not know much about abi-compliance-checker , but we have trouble on this change.

I have environment with

ros-kinetic-rosconsole-bridge amd64 0.5.1-0xenial-20180222-205946-0800
ros-kinetic-urdf amd64 1.12.12-0xenial-20180803-180744-0800

which we have installed rosconsole-bridge a few month ago, and just install ros-kinetic-urdf yesterday. And this fail to compile our packages . Ofcourse if we install rosconsole-bridge or run dist-upgrade , we can fix this, but I'm afraid this solution is very hard to find for most of people.

https://api.travis-ci.org/v3/job/412049716/log.txt


'/workspace/ros/ws_jsk_recognition/src/jsk_recognition/jsk_perception/trained_data/ObjectnessTrainedModel'

In file included from /opt/ros/kinetic/include/robot_self_filter/self_mask.h:44:0,

                 from /workspace/ros/ws_jsk_recognition/src/jsk_recognition/jsk_perception/include/jsk_perception/robot_to_mask_image.h:40,

                 from /workspace/ros/ws_jsk_recognition/src/jsk_recognition/jsk_perception/src/robot_to_mask_image.cpp:36:

/opt/ros/kinetic/include/resource_retriever/retriever.h:81:38: warning: defaulted and deleted functions only available with -std=c++11 or -std=gnu++11

   Retriever(const Retriever & ret) = delete;

                                      ^

/opt/ros/kinetic/lib/liburdf.so: undefined reference to `rosconsole_bridge::RegisterOutputHandlerProxy::~RegisterOutputHandlerProxy()'

collect2: error: ld returned 1 exit status

make[2]: *** 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants