Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner_data access outside mutex lock inside behavior_path_planner_node #2495

Closed
3 tasks done
VRichardJP opened this issue Dec 13, 2022 · 8 comments
Closed
3 tasks done
Assignees
Labels
type:bug Software flaws or errors.

Comments

@VRichardJP
Copy link
Contributor

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I'm convinced that this is not my fault but a bug.

Description

Found the exact same issue than #2422 here:

util::getDrivableAreaForAllSharedLinestringLanelets(planner_data));

The data is accessed outside the mutex_pd_ lock guard

While working on my PR, I reset the route_handler internal data after a new route has just arrived. The data is only left uninitialized while the new route is being checked/loaded (which happens in the same callback). However I got the following segfault, showing that another thread tried to access the data at the same time:

Thread 14 "component_conta" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe27d4640 (LWP 79951)]
0x00007fffe0175c8b in lanelet::utils::query::getClosestLanelet(std::vector<lanelet::ConstLanelet, std::allocator<lanelet::ConstLanelet> > const&, geometry_msgs::msg::Pose_<std::allocator<void> > const&, lanelet::ConstLanelet*) () from /home/sig/autoware/install/lanelet2_extension/lib/liblanelet2_extension_lib.so
(gdb) up
#1  0x00007fffd115efe6 in route_handler::LaneletRoute::getClosestLaneletPointWithinRoute (this=0x0, pose=...) at /home/sig/autoware/src/universe/autoware.universe/planning/route_handler/src/lanelet_route.cpp:394
394	  if (!lanelet::utils::query::getClosestLanelet(route_lanelets_, pose, &closest_lanelet)) {
(gdb) p route_lanelets_
Cannot access memory at address 0x20
(gdb) p this
$1 = (const route_handler::LaneletRoute * const) 0x0
(gdb) up
#2  0x00007fffd29e05f0 in behavior_path_planner::util::getDrivableAreaForAllSharedLinestringLanelets (planner_data=std::shared_ptr<const behavior_path_planner::PlannerData> (use count 10, weak count 0) = {...}) at /home/sig/autoware/src/universe/autoware.universe/planning/behavior_path_planner/src/utilities.cpp:1647
1647	  const auto current_point = lanelet_route_ptr->getClosestLaneletPointWithinRoute(ego_pose);
(gdb) 
#3  0x00007fffd27a7d78 in behavior_path_planner::BehaviorPathPlannerNode::run (this=0x5555558383d0) at /home/sig/autoware/src/universe/autoware.universe/planning/behavior_path_planner/src/behavior_path_planner_node.cpp:658
658	      util::getDrivableAreaForAllSharedLinestringLanelets(planner_data));
(gdb) 


Expected behavior

no concurrent access

Actual behavior

unprotected access

Steps to reproduce

N/A

Versions

No response

Possible causes

No response

Additional context

No response

@BonoloAWF BonoloAWF added the type:bug Software flaws or errors. label Dec 13, 2022
@purewater0901
Copy link
Contributor

@VRichardJP
Thanks for your issue. I submitted a new PR. I think this solves all of the problems related to planner data.

@VRichardJP
Copy link
Contributor Author

VRichardJP commented Dec 14, 2022

Thank you, I will test it.

However, I am not convinced copying the planner_data_ is a good idea.
For example, imagine you have any pointer inside the structure:

struct PlannerData {
  DataPtr data_ptr;
  // some other things... 
};

If you do this:

mutex_pd_.lock();
const auto planner_data = std::make_shared<PlannerData>(*planner_data_);
mutex_pd_.unlock();

planner_data.data_ptr->getInternalState();

Then, planner_data->data_ptr and planner_data_->data_ptr points to the same thing. So if someone else does this:

// some other thread
mutex_pd_.lock();
planner_data_.data_ptr->changeInternalState();
mutex_pd_.unlock();

You may have the planner_data.data_ptr->getInternalState() reading something while planner_data_.data_ptr->changeInternalState() is writing it.

@purewater0901
Copy link
Contributor

@VRichardJP
Thank you for your suggestions and comments. I made a new PR, what do you think about this change?

#2510

@xmfcx
Copy link
Contributor

xmfcx commented Jan 10, 2023

@VRichardJP does this solve this bug report? #2510

@VRichardJP
Copy link
Contributor Author

VRichardJP commented Jan 11, 2023

I am sorry, I still think there is a concurrency issue.

The BehaviorPathPlannerNode::createLatestPlannerData() function in behavior_path_planner_node.cpp returns a simple copy of the planner_data_. As a consequence, you have planner_data_->route_handler == createLatestPlannerData()->route_handler. In other words, when createLatestPlannerData() result is accessed here:

const auto output = bt_manager_->run(planner_data);

Nothing prevents another function from modifying planner_data_->route_handler. For example:

void BehaviorPathPlannerNode::onVelocity(const Odometry::ConstSharedPtr msg)
{
const std::lock_guard<std::mutex> lock(mutex_pd_);
planner_data_->self_odometry = msg;
}

There are only 3 ways to solve the issue:

  • Add lock guard very every single read/write of planner_data_, or
  • Put every single callback using planner_data_ in the same rclcpp::CallbackGroupType::MutuallyExclusive callback group, or
  • Just make the node single threaded

I think the second or third option are simpler. Actually, most callbacks are already in the same group. I think only the timer callback to BehaviorPathPlannerNode::run is not in the group:

timer_ = rclcpp::create_timer(
this, get_clock(), period_ns, std::bind(&BehaviorPathPlannerNode::run, this));

@xmfcx
Copy link
Contributor

xmfcx commented Feb 21, 2023

@VRichardJP is this issue solved by these two PRs above?

@VRichardJP
Copy link
Contributor Author

@xmfcx
No.

In the current state, the code has no concurrent memory access. This was also true before the few changes. But this is just an illusion, as the inherent problem has yet to be solved. My comment above still holds: the inner data of planner_data_ is accessed outside the mutex lock (through planner_data).

It can be verified with this small example:

diff --git a/launch/tier4_planning_launch/launch/scenario_planning/lane_driving/behavior_planning/behavior_planning.launch.py b/launch/tier4_planning_launch/launch/scenario_planning/lane_driving/behavior_planning/behavior_planning.launch.py
index 293d8bebf2..7fc7eb8299 100644
--- a/launch/tier4_planning_launch/launch/scenario_planning/lane_driving/behavior_planning/behavior_planning.launch.py
+++ b/launch/tier4_planning_launch/launch/scenario_planning/lane_driving/behavior_planning/behavior_planning.launch.py
@@ -225,6 +225,7 @@ def launch_setup(context, *args, **kwargs):
             behavior_velocity_planner_component,
         ],
         output="screen",
+        prefix="gnome-terminal --",
     )
 
     # This condition is true if run_out module is enabled and its detection method is Points
diff --git a/planning/behavior_path_planner/include/behavior_path_planner/behavior_path_planner_node.hpp b/planning/behavior_path_planner/include/behavior_path_planner/behavior_path_planner_node.hpp
index 45eab08040..60d25fc023 100644
--- a/planning/behavior_path_planner/include/behavior_path_planner/behavior_path_planner_node.hpp
+++ b/planning/behavior_path_planner/include/behavior_path_planner/behavior_path_planner_node.hpp
@@ -103,6 +103,8 @@ private:
   rclcpp::Publisher<MarkerArray>::SharedPtr bound_publisher_;
   rclcpp::Publisher<PoseWithUuidStamped>::SharedPtr modified_goal_publisher_;
   rclcpp::TimerBase::SharedPtr timer_;
+  rclcpp::TimerBase::SharedPtr timer2_;
+  rclcpp::CallbackGroup::SharedPtr timer2_cb_group_;
 
   std::map<std::string, rclcpp::Publisher<Path>::SharedPtr> path_candidate_publishers_;
 
diff --git a/planning/behavior_path_planner/src/behavior_path_planner_node.cpp b/planning/behavior_path_planner/src/behavior_path_planner_node.cpp
index 11f89c06e0..feb5d78026 100644
--- a/planning/behavior_path_planner/src/behavior_path_planner_node.cpp
+++ b/planning/behavior_path_planner/src/behavior_path_planner_node.cpp
@@ -26,6 +26,7 @@
 
 #include <memory>
 #include <string>
+#include <thread>
 #include <utility>
 #include <vector>
 
@@ -191,6 +192,21 @@ BehaviorPathPlannerNode::BehaviorPathPlannerNode(const rclcpp::NodeOptions & nod
     const auto period_ns = rclcpp::Rate(planning_hz).period();
     timer_ = rclcpp::create_timer(
       this, get_clock(), period_ns, std::bind(&BehaviorPathPlannerNode::run, this));
+    timer2_cb_group_ = create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
+    timer2_ = rclcpp::create_timer(
+      this, get_clock(), rclcpp::Rate(1.0).period(), [&]() {
+        const std::lock_guard<std::mutex> lock(mutex_pd_);
+        if (has_received_map_) {
+          std::cout << "Updating Map" << std::endl;
+          planner_data_->route_handler->setMap(*map_ptr_);
+          has_received_map_ = false;
+        }
+        if (has_received_route_) {
+          std::cout << "Updating Route" << std::endl;
+          planner_data_->route_handler->setRoute(*route_ptr_);
+          has_received_route_ = false;
+        }
+      }, timer2_cb_group_);
   }
 }
 
@@ -783,6 +799,15 @@ void BehaviorPathPlannerNode::run()
   // create latest planner data
   const auto planner_data = createLatestPlannerData();
 
+  for (;;) {
+    // planner_data route size should never change.
+    printf("planner_data:%zu planner_data_:%zu\n", 
+      planner_data->route_handler->getRouteLanelets().size(),
+      planner_data_->route_handler->getRouteLanelets().size());
+    using namespace std::chrono_literals;
+    std::this_thread::sleep_for(1s);
+  }
+
   // run behavior planner
   const auto output = bt_manager_->run(planner_data);
 
diff --git a/planning/route_handler/include/route_handler/route_handler.hpp b/planning/route_handler/include/route_handler/route_handler.hpp
index 591b436151..588c560b9e 100644
--- a/planning/route_handler/include/route_handler/route_handler.hpp
+++ b/planning/route_handler/include/route_handler/route_handler.hpp
@@ -292,6 +292,7 @@ public:
   double getLaneChangeableDistance(
     const Pose & current_pose, const LaneChangeDirection & direction) const;
   lanelet::ConstPolygon3d getIntersectionAreaById(const lanelet::Id id) const;
+  lanelet::ConstLanelets getRouteLanelets() const;
 
 private:
   // MUST
@@ -336,7 +337,6 @@ private:
     const lanelet::ConstLanelet & lanelet, lanelet::ConstLanelet * right_lanelet) const;
   bool getLeftLaneletWithinRoute(
     const lanelet::ConstLanelet & lanelet, lanelet::ConstLanelet * left_lanelet) const;
-  lanelet::ConstLanelets getRouteLanelets() const;
   lanelet::ConstLanelets getLaneletSequenceUpTo(
     const lanelet::ConstLanelet & lanelet,
     const double min_length = std::numeric_limits<double>::max()) const;

With these changes:

  • run() is stuck in an endless loop, printing the size of the route lanelet held by planner_data and planner_data_ every second.
  • A loop timer check if a new map/route has been received, and update planner_data_ when necessary. Everything happens while mutex_pd_ is locked, so this code snippet is safe.

If everything is safe, the printing loop should always display the same value for planner_data, even if planner_data_ is modified.

Compile, source, then run autoware in the planning simulator. For example:

$ ros2 launch autoware_launch planning_simulator.launch.xml map_path:=../data/autoware_data_universe/sample-map-planning/ vehicle_model:=sample_vehicle sensor_model:=sample_sensor_kit

Set a start and goal points. In the terminal window, you can see the current route size displayed every second. If you change the goal point so the route has more/less lanelets, you can see the both planner_data and planner_data_ numbers are modified. In other words, the printing loop is accessing planner_data_ inner data through planner_data, despite being outside mutex lock.

image

VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Feb 22, 2023
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 2, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 3, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 6, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 6, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 6, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 7, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 23, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 24, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
VRichardJP added a commit to VRichardJP/autoware.universe that referenced this issue Mar 24, 2023
solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix
purewater0901 pushed a commit that referenced this issue Mar 29, 2023
* fix(behavior_path_planner): concurrency issue

solves #2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix

* style(pre-commit): autofix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@VRichardJP
Copy link
Contributor Author

Fixed by #2933

ktro2828 pushed a commit to ktro2828/autoware.universe that referenced this issue Apr 7, 2023
* fix(behavior_path_planner): concurrency issue

solves autowarefoundation#2495

Signed-off-by: Vincent Richard <[email protected]>
Signed-off-by: Vincent Richard <[email protected]>

style(pre-commit): autofix

* style(pre-commit): autofix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Software flaws or errors.
Projects
None yet
Development

No branches or pull requests

4 participants