Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hardware component unconfiguration #1755

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sahand-ghaffari-ocado
Copy link

Summary
Unconfiguring a hardware component using the ~/set_hardware_component_state service results in code failures. This occurs because the on_cleanup() function removes resources while the read and write functions of the hardware component are still attempting to access those resources. The issue arises because the primary state is not set to UNCONFIGURED until after the on_cleanup function is executed, rather than before it is called.

Problem Description

  1. After ~/set_hardware_component_state service call is invoked, the set_hardware_component_state_srv_cb() function is executed. This function then calls set_component_state() function. In this function, the cleanup_hardware() function is called when the target state is PRIMARY_STATE_UNCONFIGURED.
  2. The cleanup_hardware function uses the bind method to call the cleanup method of either the SystemInterface, SensorInterface, or ActuatorInterface class, depending on the type of hardware component. The cleanup() function, in turn, calls the on_cleanup function of the hardware component class. In this context, the on_cleanup function is called in the URPositionHardwareInterface class, which is defined in the Universal_Robots_ROS2_Driver repository and inherits from hardware_interface::SystemInterface.
  3. The on_cleanup function removes and unassigns pointers, as well as cleans up threads, while the read and write functions of the ControllerManager class continue running. The read and write functions in the ControllerManager class call the corresponding read and write functions in the ResourceManager class, which then invoke the read and write functions of the hardware components from the SystemInterface, SensorInterface, or ActuatorInterface classes. These functions first check if the state is PRIMARY_STATE_INACTIVE or PRIMARY_STATE_ACTIVE before executing the read and write operations on the hardware component.
  4. If the on_cleanup function is called and removes some of the resources while the state of the robot has not yet been set to UNCONFIGURED, the read and write functions of the hardware component can still be called. Since these functions attempt to access resources that have already been removed, this can result in code crashes.

Environment:

  • OS: Ubuntu 20.04
  • Version: Humble

Proposed Solution
To prevent such crashes, it's suggested to ensure that the state is properly set to UNCONFIGURED before any resources are cleaned up. This way, the read and write functions will not be invoked after resources have been removed, avoiding access to invalid or dangling pointers. Therefore, it is suggested to modify the cleanup() function in SystemInterface, SensorInterface or ActuatorInterface

Copy link
Contributor

mergify bot commented Sep 16, 2024

@sahand-ghaffari-ocado, all pull requests must be targeted towards the master development branch.
Once merged into master, it is possible to backport to humble, but it must be in master
to have these changes reflected into new distributions.

Copy link
Contributor

mergify bot commented Sep 16, 2024

This pull request is in conflict. Could you fix it @sahand-ghaffari-ocado?

@bmagyar
Copy link
Member

bmagyar commented Sep 17, 2024

Can the issue be reproduced via a unit test? Could you please add one if possible?

Copy link
Member

@saikishor saikishor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sahand-ghaffari-ocado regarding your issues, it might happen in Humble, but I don't think this is the case with the Rolling version.

Could you try to check with your setup by compiling from sources (Development version), this way you can run the setup of the Rolling on Humble without any issues.

vcs import --input https://raw.githubusercontent.com/ros-controls/ros2_control_ci/master/ros_controls.rolling-on-$ROS_DISTRO.repos src

image

If you confirm this is working, then we just have to do some backports to have a proper fix on Humble. For instance, #1646 should kind of fix this issue.

@saikishor
Copy link
Member

@sahand-ghaffari-ocado any update on this?

@sahand-ghaffari-ocado
Copy link
Author

@sahand-ghaffari-ocado any update on this?

@saikishor thank you for your response. I haven't had the chance to test the Rolling version yet, but I plan to give it a try soon.

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.78%. Comparing base (eb4c19d) to head (ed2952b).
Report is 13 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1755   +/-   ##
=======================================
  Coverage   86.77%   86.78%           
=======================================
  Files         116      116           
  Lines       10703    10703           
  Branches      981      967   -14     
=======================================
+ Hits         9288     9289    +1     
  Misses       1062     1062           
+ Partials      353      352    -1     
Flag Coverage Δ
unittests 86.78% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
hardware_interface/src/actuator.cpp 72.18% <100.00%> (ø)
hardware_interface/src/sensor.cpp 70.53% <100.00%> (ø)
hardware_interface/src/system.cpp 72.18% <100.00%> (ø)

... and 1 file with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants