Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in MessageFilter #539

Open
wants to merge 9 commits into
base: noetic-devel
Choose a base branch
from

Conversation

rhaschke
Copy link
Contributor

@rhaschke rhaschke commented Jul 2, 2022

This is a port of #538 to Noetic.

... revealing that propagating messages via a ROS callback queue is inherently unsafe.
If the filter is going to be destroyed, a CBQueueCallback::call() might still be in action,
locking the underlying Signal1's mutex_.
This will cause as assertion failure during destruction of the mutex:
boost::mutex::~mutex(): Assertion `!posix::pthread_mutex_destroy(&m)' failed.
When MessageFilter::signalMessage() is called asynchronouyls via a ROS callback queue,
this call might still be in progress while another (holding Signal1::mutex_), while
another thread might attempt to remove the MessageFilter.
This results in a failed assertion as the MessageFilter destructor attempts to destroy a locked mutex.

Here, by introducing another mutex in MessageFilter, we ensure that CBQueueCallback::call() is finished
before destroying the MessageFilter.
PR ros#144 attempted to resolve a deadlock by postponing calls to callbacks.
However, this has the drawback of using references to TransformableRequests -
outside the protection by transformable_requests_mutex_.
Thus, these requests might get deleted by another thread, resulting in segfaults:

terminate called after throwing an instance of 'boost::wrapexcept<boost::lock_error>'
  what():  boost: mutex lock failed in pthread_mutex_lock: Invalid argument

This reverts commit bfb8038.
As pointed out in ros#144 the deadlock arises due to an inversion of locking order:
- MessageFilter::add(), when removing the oldest message, locks:
  - MessageFilter::messages_mutex_
  - BufferCore::transformable_requests_mutex_ (via cancelTransformableRequest())
- BufferCore::testTransformableRequests() locks:
  - transformable_requests_mutex_
  - MessageFilter::message_mutex_ (via MessageFilter::transformable() callback)

PR ros#144 attempted to resolve the issue by postponing callback calls.
However, this has the drawback of using references to TransformableRequests -
outside the protection by transformable_requests_mutex_.
These requests might got deleted by another thread meanwhile!

Here the deadlock is resolved in MessageFilter::add, cancelling the requests
outside the messages_mutex_ protection.
While MessageFilter::clear() removes pending messages from the callback queue,
there might be still callbacks active when attempting to remove the MessageFilter.
Depending on what goes first, either the already destroyed mutex is tried to lock
or a locked mutex is tried to destroy, both resulting in an assertion failure.

This race condition cannot completely avoided.
Using a shared mutex for all callers and moving the unique lock to the end
of the destructor hopefully reduces the probability of such a race condition.
@rhaschke rhaschke requested a review from tfoote as a code owner July 2, 2022 07:08
@@ -1455,12 +1449,8 @@ void BufferCore::testTransformableRequests()
M_TransformableCallback::iterator it = transformable_callbacks_.find(req.cb_handle);
if (it != transformable_callbacks_.end())
{
transformables.push_back(boost::make_tuple(boost::ref(it->second),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objects referenced here might get destroyed before they are used below, resulting in a segfault.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look a little deeper that this isn't going to restore the potential deadlock of #91

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Fortunately, unit tests nicely cover this deadlock. While 23e6295 restores the deadlock (as expected), resulting in a RLTestTimeoutException, 9fdf547 fixes it again using the method described in #538.

MessageFilter::clear();
bc_.removeTransformableCallback(callback_handle_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback_handle_ should be deregistered with BufferCore.

Copy link
Member

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a good fix for the MessageFilter destruction which was not clearly accounted for. And the unit tests for it are very helpful. I want to give the feedback here but I want to read/analyze a little bit deeper in the BufferCore changes, but I don't have time right now to do so.

@@ -1455,12 +1449,8 @@ void BufferCore::testTransformableRequests()
M_TransformableCallback::iterator it = transformable_callbacks_.find(req.cb_handle);
if (it != transformable_callbacks_.end())
{
transformables.push_back(boost::make_tuple(boost::ref(it->second),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look a little deeper that this isn't going to restore the potential deadlock of #91

@tfoote tfoote self-requested a review August 15, 2022 23:57
@rhaschke
Copy link
Contributor Author

Ping @tfoote.

@rhaschke
Copy link
Contributor Author

@tfoote, would be great if you could have the deeper look you promised nearly a year ago...

@tfoote tfoote self-assigned this May 31, 2023
@ahcorde ahcorde added the noetic label Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants