Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate messages in subscription callbacks for critical error in population #4177

Closed
GoesM opened this issue Mar 12, 2024 · 6 comments
Closed

Comments

@GoesM
Copy link
Contributor

GoesM commented Mar 12, 2024

Bug report

Required Info:

  • Operating System:
    • Ubuntu22.04
  • ROS2 Version:
    • humble
  • Version or commit hash:
    • the latest
  • DDS implementation:
    • defaulted

Steps to reproduce issue

Launch the navigation2 normally, as following steps:

#!/bin/bash
export ASAN_OPTIONS=halt_on_error=0:new_delete_type_mismatch=0:detect_leaks=0:log_pah=asan
source install/setup.bash
export TURTLEBOT3_MODEL=waffle
export GAZEBO_MODEL_PATH=$GAZEBO_MODEL_PATH:/opt/ros/humble/share/turtlebot3_gazebo/models
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=True use_rviz:=False use_composition:=False 

Curious about how nav2 face to topic-interception, i keep sending demaged /map msg onto topic /map, which is like this:

ros2 topic pub /map nav_msgs/msg/OccupancyGrid "
header:
  stamp:
    sec: 0
    nanosec: 0
  frame_id: map
info:
  map_load_time:
    sec: 0
    nanosec: 0
  resolution: 0.05000000074505806
  width: 384
  height: 384
  origin:
    position:
      x: -10.0
      y: -10.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
data: [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1] "

[notice] the real length of data is less than width*height

then, the asan report would occur.

Expected behavior

Actual behavior

The ASAN reporting a buffer-overflow bug to me, as following:

=================================================================
==14154==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6210000667f5 at pc 0x7b380c985c4a bp 0x7ffdb96543c0 sp 0x7ffdb96543b8
READ of size 1 at 0x6210000667f5 thread T0
    #0 0x7b380c985c49 in nav2_amcl::AmclNode::convertMap(nav_msgs::msg::OccupancyGrid_<std::allocator<void> > const&) (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x385c49) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #1 0x7b380c984f77 in nav2_amcl::AmclNode::handleMapMessage(nav_msgs::msg::OccupancyGrid_<std::allocator<void> > const&) (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x384f77) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #2 0x7b380c984477 in nav2_amcl::AmclNode::mapReceived(std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >) (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x384477) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #3 0x7b380ca7ea3e in void std::__invoke_impl<void, void (nav2_amcl::AmclNode::*&)(std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >), nav2_amcl::AmclNode*&, std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > > >(std::__invoke_memfun_deref, void (nav2_amcl::AmclNode::*&)(std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >), nav2_amcl::AmclNode*&, std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >&&) (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x47ea3e) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #4 0x7b380cac79b2 in auto rclcpp::AnySubscriptionCallback<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::dispatch(std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >, rclcpp::MessageInfo const&)::'lambda'(auto&&)::operator()<std::function<void (std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >)>&>(auto&&) const (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x4c79b2) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #5 0x7b380cac489b in rclcpp::AnySubscriptionCallback<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::dispatch(std::shared_ptr<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > >, rclcpp::MessageInfo const&) (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x4c489b) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #6 0x7b380ca8acac in rclcpp::Subscription<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void>, nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, rclcpp::message_memory_strategy::MessageMemoryStrategy<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> > >::handle_message(std::shared_ptr<void>&, rclcpp::MessageInfo const&) (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x48acac) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d)
    #7 0x7b380d97d7bb in rclcpp::Executor::execute_subscription(std::shared_ptr<rclcpp::SubscriptionBase>) (/opt/ros/humble/lib/librclcpp.so+0xe77bb) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #8 0x7b380d97dfbe in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) (/opt/ros/humble/lib/librclcpp.so+0xe7fbe) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #9 0x7b380d9858af in rclcpp::executors::SingleThreadedExecutor::spin() (/opt/ros/humble/lib/librclcpp.so+0xef8af) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #10 0x7b380d985ac4 in rclcpp::spin(std::shared_ptr<rclcpp::node_interfaces::NodeBaseInterface>) (/opt/ros/humble/lib/librclcpp.so+0xefac4) (BuildId: 4cca8a387f3c93d38a0567a8efc7cba9106f5d9a)
    #11 0x5cf6d6474975 in main (/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xeb975) (BuildId: 9cab2dfb4fd0f7edff9ad0c4896458037daa0009)
    #12 0x7b380bc29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #13 0x7b380bc29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #14 0x5cf6d63b4514 in _start (/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0x2b514) (BuildId: 9cab2dfb4fd0f7edff9ad0c4896458037daa0009)

0x6210000667f5 is located 0 bytes to the right of 4853-byte region [0x621000065500,0x6210000667f5)
allocated by thread T0 here:
    #0 0x5cf6d647212d in operator new(unsigned long) (/***/nav2_humble/install/nav2_amcl/lib/nav2_amcl/amcl+0xe912d) (BuildId: 9cab2dfb4fd0f7edff9ad0c4896458037daa0009)
    #1 0x7b380dc73fb9 in nav_msgs::msg::typesupport_fastrtps_cpp::cdr_deserialize(eprosima::fastcdr::Cdr&, nav_msgs::msg::OccupancyGrid_<std::allocator<void> >&) (/opt/ros/humble/lib/libnav_msgs__rosidl_typesupport_fastrtps_cpp.so+0x8fb9) (BuildId: d9989f748e1c94c608a4bedd7dfd0f7db1970877)
    #2 0x7f0000000000  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/***/nav2_humble/install/nav2_amcl/lib/libamcl_core.so+0x385c49) (BuildId: 093c184d6356ee01ad4bd31576c7ab18775c096d) in nav2_amcl::AmclNode::convertMap(nav_msgs::msg::OccupancyGrid_<std::allocator<void> > const&)
Shadow bytes around the buggy address:
  0x0c4280004ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4280004cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4280004cc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4280004cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4280004ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c4280004cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[05]fa
  0x0c4280004d00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4280004d10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4280004d20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4280004d30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4280004d40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==14154==ABORTING

Additional information


code analysis:
in function AmclNode::convertMap() the for use the size_x*size_y as the boundary of the vector map_msg.data

  // Convert to player format
  for (int i = 0; i < map->size_x * map->size_y; i++) {
    if (map_msg.data[i] == 0) {
      map->cells[i].occ_state = -1;
    } else if (map_msg.data[i] == 100) {
      map->cells[i].occ_state = +1;
    } else {
      map->cells[i].occ_state = 0;
    }
  }

however, the real length of the vector map_msg.data is up to the user's command which may be less than size_x*size_y

thus, it would bring a buffer-overflow because out of the boundary of the vector.

@GoesM
Copy link
Contributor Author

GoesM commented Mar 12, 2024

some suggestions to avoid such buffer-overflow:

  • just use the real size of vector as the boundary
  • design nsg-recovery mechanism for /map ?

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 12, 2024

It is usually expected that inputs are well formed - but even in writing this answer I find myself disagreeing with myself that maybe we should do some basic message validation to prevent and/or expose potential problems before users run into them.

My suggestion would be to audit the input information to Nav2 (occupancy grid messages, pointclouds, laser scans, etc) to get a list of all message types that come in. Then, we can look at them to see what places we need to do validation checks to make sure the message won't bring down the system (ie map size and actual data don't match; pointcloud strides and its data size don't match; unimplemented or invalid values in LaserScan angles, etc). At that point, we can make a new nav2_util/validate_messages.hpp that contains these validation methods and use those methods in all the callbacks to reject updates if they're invalid.

Basically change all callbacks to first thing do

{
  if (!nav2_util::validateMsg(msg)) {
    RCLCPP_ERROR(logger_, "Message came in at time X of type Y that is malformed. Rejecting.");
    return;
  }
}

Or, even more interestingly, we could use the topic tools in ROS 2 to create a message filter that the filter does this action so that all callbacks when triggered have already been pre-validated. There's examples of this using TF transforms, but we can make arbitrary filters.

@GoesM
Copy link
Contributor Author

GoesM commented Mar 13, 2024

Interesting suggestion! ^_^

I agree that basic message validation is in need, especially considering the exposure of ros2-topics within the local network, where they are susceptible to both intentional and unintentional interference (in my opinion.

Your proposed solution indeed sounds great, while I must admit that direct implementation of nav2_util/validate_messages.hpp might be challenging for me. However, if there's already a basic framework of nav2_util::validateMsg(msg) in place, I'd be more than willing to contribute to refining its implementation details.

@SteveMacenski
Copy link
Member

It should just be a set of header files

bool validateMessage(const MyMessageType & msg) 
{
  // check the msg fields to make sure reasonably implemented
}

It should be very straight forward. I won't have time for this and since this is something important to you, I suggest taking the lead on it. The checks are simple: sizes match necessary sizes, no INF or NAN in key fields. It doesn't need to check quality, just anything that would cause a segfault or NAN where it isn't expected

@GoesM
Copy link
Contributor Author

GoesM commented Mar 14, 2024

May I open a long-term PR after implementing a basic framework, to continually update it whenever I come across new tickets?

Alternatively, should I open a PR dedicated solely to fixing this issue after implementing the framework, and then open new PRs to address any new input validation issues that arise?

I want to choose one of them that minimizes disruption to you while still being effective. ^_^

@SteveMacenski
Copy link
Member

That sounds like a plan:

  • Add the new nav2_util header with the map validator
  • Use that map validator in a subscription filter or at the start of the callback where relevant
  • Over time, as you find issues in your fuzzing experiments, add additional validation utils in the appropriate spots

The only thing I'd ask is that you make sure to setup for all subscriptions of a given type when you add it in one place (ie all occ grid subscribers) so that we don't have some where its done and some where its not. But, I think that perfectly aligns with your intent / need so I don't think you'd have an issue with that :-)

@SteveMacenski SteveMacenski changed the title Damaged map message will result in a heap-buffer-overflow bug in nav2_amcl Validate messages in subscription callbacks for critical error in population Mar 27, 2024
@GoesM GoesM closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants