Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transfered to geometry2] potential bugs perhaps casued by transformReadyCallback of tf2_ros::MessageFilter #4414

Closed
GoesM opened this issue Jun 11, 2024 · 8 comments

Comments

@GoesM
Copy link
Contributor

GoesM commented Jun 11, 2024

Bug report

Required Info:

  • Operating System:
    • Ubuntu22.04
  • ROS2 Version:
    • humble
  • Version or commit hash:
    • the latest
  • DDS implementation:
    • defaulted

Steps to reproduce issue

Launch the navigation2 normally, as following steps:

#!/bin/bash
export ASAN_OPTIONS=halt_on_error=0:new_delete_type_mismatch=0:detect_leaks=0:log_pah=asan
source install/setup.bash
export TURTLEBOT3_MODEL=waffle
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:/opt/ros/humble/share/turtlebot3_gazebo/models
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=True use_rviz:=False use_composition:=False 

Keep sending following command , until amcl crashed.

ros2 param set /amcl beam_skip_threshold 3.14

Expected behavior

no crash anymore

Actual behavior

sometimes crash, sometimes no crash.

Additional information


This may be a related issue of #4379 and #4397 , still about the UAF-bug occurring after request for dynamic parameter, BUT there's something different.

Before the work of #4397 , if we send a ros2 param set /amcl beam_skip_threshold 3.14 command , the amcl node will immediately crash. So #4397 found ONE OF reasons and fixed it : laser_scan_filter_.reset() is ignored.

After the work of #4397, we could find that ros2 param set /amcl beam_skip_threshold 3.14 would not immediately let amcl crashed, and dynamic-parameter would work correctly MOST OF times.

BUT, just MOST OF, there're still some special times in which amcl crashed. And following is the Asan-report in these times.

=================================================================
==76105==ERROR: AddressSanitizer: heap-use-after-free on address 0x6170000a5c90 at pc 0x79ff0b559bba bp 0x79ff01bf6ab0 sp 0x79ff01bf6aa8
READ of size 8 at 0x6170000a5c90 thread T12
    #0 0x79ff0b559bb9 in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::transformReadyCallback(tf2_ros::TransformStampedFuture const&, unsigned long) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x559bb9) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #1 0x79ff0b55db4e in std::_Function_handler<void (tf2_ros::TransformStampedFuture const&), std::_Bind<void (tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::* (tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>*, std::_Placeholder<1>, unsigned long))(tf2_ros::TransformStampedFuture const&, unsigned long)> >::_M_invoke(std::_Any_data const&, tf2_ros::TransformStampedFuture const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x55db4e) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #2 0x79ff0c85130b in std::_Function_handler<void (unsigned long const&), std::_Bind<void (tf2_ros::Buffer::* (tf2_ros::Buffer*, std::_Placeholder<1>, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped_<std::allocator<void> > > >, tf2_ros::TransformStampedFuture, std::function<void (tf2_ros::TransformStampedFuture const&)>))(unsigned long const&, std::shared_ptr<std::promise<geometry_msgs::msg::TransformStamped_<std::allocator<void> > > >, tf2_ros::TransformStampedFuture, std::function<void (tf2_ros::TransformStampedFuture const&)>)> >::_M_invoke(std::_Any_data const&, unsigned long const&) (/opt/ros/humble/lib/libtf2_ros.so+0x5130b) (BuildId: 4ee0e77870b47da6b41a530a2970e853eafce773)
    #3 0x79ff0c84eb35 in std::_Function_handler<void (), std::_Bind<void (tf2_ros::CreateTimerROS::* (tf2_ros::CreateTimerROS*, unsigned long, std::function<void (unsigned long const&)>))(unsigned long const&, std::function<void (unsigned long const&)>)> >::_M_invoke(std::_Any_data const&) (/opt/ros/humble/lib/libtf2_ros.so+0x4eb35) (BuildId: 4ee0e77870b47da6b41a530a2970e853eafce773)
    #4 0x79ff0c84dd97  (/opt/ros/humble/lib/libtf2_ros.so+0x4dd97) (BuildId: 4ee0e77870b47da6b41a530a2970e853eafce773)
    #5 0x79ff0c407030 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (/opt/ros/humble/lib/librclcpp.so+0xe7030) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #6 0x79ff0c40e97f in rclcpp::executors::SingleThreadedExecutor::spin() (/opt/ros/humble/lib/librclcpp.so+0xee97f) (BuildId: bd69d2a87470f6b9fdc0534ef05dc334083a0b6c)
    #7 0x79ff0aadc252  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc252) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #8 0x79ff0a694ac2 in start_thread nptl/./nptl/pthread_create.c:442:8
    #9 0x79ff0a72684f  misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

0x6170000a5c90 is located 656 bytes inside of 704-byte region [0x6170000a5a00,0x6170000a5cc0)
freed by thread T0 here:
    #0 0x5f969785c98d in operator delete(void*) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/nav2_amcl/amcl+0xe998d) (BuildId: 8210e9eb79b766128e96cd73bfae12d14cdc1560)
    #1 0x79ff0b3ec48c in std::__uniq_ptr_impl<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>, std::default_delete<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer> > >::reset(tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>*) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x3ec48c) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #2 0x79ff0b45a3ce in std::__invoke_result<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>::type std::__invoke<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::*&)(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >), nav2_amcl::AmclNode*&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x45a3ce) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #3 0x79ff0b45a231 in rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > std::__invoke_impl<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> >, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&>(std::__invoke_other, std::_Bind<rcl_interfaces::msg::SetParametersResult_<std::allocator<void> > (nav2_amcl::AmclNode::* (nav2_amcl::AmclNode*, std::_Placeholder<1>))(std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> >)>&, std::vector<rclcpp::Parameter, std::allocator<rclcpp::Parameter> > const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x45a231) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)

previously allocated by thread T0 here:
    #0 0x5f969785c12d in operator new(unsigned long) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/nav2_amcl/amcl+0xe912d) (BuildId: 8210e9eb79b766128e96cd73bfae12d14cdc1560)
    #1 0x79ff0b3e42df in std::_MakeUniq<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer> >::__single_object std::make_unique<tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>, message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>&, tf2_ros::Buffer&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>, std::shared_ptr<rclcpp::node_interfaces::NodeClockInterface>, std::chrono::duration<long, std::ratio<1l, 1000000000l> >&>(message_filters::Subscriber<sensor_msgs::msg::LaserScan_<std::allocator<void> >, rclcpp_lifecycle::LifecycleNode>&, tf2_ros::Buffer&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&&, std::shared_ptr<rclcpp::node_interfaces::NodeLoggingInterface>&&, std::shared_ptr<rclcpp::node_interfaces::NodeClockInterface>&&, std::chrono::duration<long, std::ratio<1l, 1000000000l> >&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x3e42df) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)

Thread T12 created by T0 here:
    #0 0x5f969780a7dc in __interceptor_pthread_create (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/nav2_amcl/amcl+0x977dc) (BuildId: 8210e9eb79b766128e96cd73bfae12d14cdc1560)
    #1 0x79ff0aadc328 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc328) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)
    #2 0x79ff0b3772f0 in nav2_amcl::AmclNode::on_configure(rclcpp_lifecycle::State const&) (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x3772f0) (BuildId: f870739ccc7779266347919193e71b4cf78860c9)
    #3 0x79ff0c3038ec  (/opt/ros/humble/lib/librclcpp_lifecycle.so+0x288ec) (BuildId: 97f6428dc1ee45fd402b522b3b8e6b4fcfeabe76)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/goes/ROS_Workstation/humble_navigation2/install/nav2_amcl/lib/libamcl_core.so+0x559bb9) (BuildId: f870739ccc7779266347919193e71b4cf78860c9) in tf2_ros::MessageFilter<sensor_msgs::msg::LaserScan_<std::allocator<void> >, tf2_ros::Buffer>::transformReadyCallback(tf2_ros::TransformStampedFuture const&, unsigned long)
Shadow bytes around the buggy address:
  0x0c2e8000cb40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c2e8000cb80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c2e8000cb90: fd fd[fd]fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x0c2e8000cba0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c2e8000cbb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000cbc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000cbd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c2e8000cbe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==76105==ABORTING

the report shows that the bug occurs during transformReadyCallback of tf2_ros::MessageFilter works, so it should be a DIFFERENT reason from #4379 causing the similar bug.

And in the function tf2_ros::MessageFilter, it seems have no memory provided by amcl, so I think :

**Is it a bug caused by the mechanism defect of tf2_ros ? **

but this function is not provided by navigation2 and I'm not very fimiliar with it , so I'm not very sure about my opinion ......

@SteveMacenski
Copy link
Member

Not sure - I think you should compile with debug flags and get a backtrace so we can see specifically where it fails https://docs.nav2.org/tutorials/docs/get_backtrace.html

It does seem likely like a tf2_ros problem though.

@GoesM
Copy link
Contributor Author

GoesM commented Jun 11, 2024

It's an interesting way ! thanks for your share, I'd try on it in my later work.

@SteveMacenski
Copy link
Member

It seems like somethings are being triggered after reset, so it would be good to know what is being called / triggered and thus see how to resolve it or if we made a mistake on not resetting everything required in the right order

@SteveMacenski
Copy link
Member

Any word here @GoesM ?

@GoesM
Copy link
Contributor Author

GoesM commented Jun 25, 2024

I conducted several experiments and discovered that the callback function (transformReadyCallback()) of tf2_ros::Message_filter can continue to execute even after the Message_filter has been destroyed.

I believe this is an issue stemming from a flaw in the destruction mechanism of tf2_ros::Message_filter. Therefore, I have opened a new issue for the geometry2 stack: ros/geometry2#565.

@SteveMacenski
Copy link
Member

Got it, think you have a potential solution? I'm not sure I can get anyone to fix it for us, but if you have a patch, I can get someone to review it for us!

@GoesM
Copy link
Contributor Author

GoesM commented Jun 26, 2024

Uh... I'm very glad to submit a PR to contribute. After all, this issue may occur frequently if users utilize dynamic parameters during navigation2 operations.

However, before submitting the PR, I think I need more time to find the most suitable fix, learning about the details of how the tf2_ros::Message_filter is designed to manage its own threads .

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 26, 2024

OK. I'd want to transfer this conversation over there though so that this doesn't sit in Nav2's issue tracker since its not actionable to Nav2 as a bug with a dependency. The linking between these tickets I think does a fine job documenting it, but I'm going to go ahead and close this and we can follow up with future discussions in that ticket in geometry2 - I have notifications set to 'on'.

@SteveMacenski SteveMacenski changed the title potential bugs perhaps casued by transformReadyCallback of tf2_ros::MessageFilter [Transfered to geometry2] potential bugs perhaps casued by transformReadyCallback of tf2_ros::MessageFilter Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants