Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix thread safety issues in UsdSkel_SkelDefinition. #2369

Merged

Conversation

cameronwhite
Copy link
Contributor

Description of Change(s)

  • In methods like _ComputeJointWorldInverseBindTransforms(), check the compute flag again after acquiring the lock to avoid potentially recomputing the result again if multiple threads were waiting on the mutex. Although the computed values would not change, it is not safe to call mutable member functions of the VtArray (which can cause a copy-on-write detach) while other threads may be in the middle of making a copy of it. An example sequence that could lead to a crash:

    • Thread A finishes the critical section, and continues on to make a copy of the array in _GetJointWorldInverseBindTransforms() (refcount == 2)
    • Thread B was waiting on the mutex and starts to redo the computation. This calls non-const member functions and enters the body of _DetachIfNotUnique() since _IsUnique() is false
    • Thread C enters _GetJointWorldInverseBindTransforms() and observes that the compute flag is set, and starts to make its own copy of the array, but hasn't bumped the refcount yet.
    • Thread A finishes with its copy and decrements the refcount (refcount == 1)
    • Thread B decrements the refcount, taking it to zero and destroying the data, before switching to its new copy of the data
    • Thread C now has an array with a reference to the deleted data block
  • Prefer using operator|= to atomically set the flag rather than doing a read -> bitwise OR -> atomic store sequence which could cause flags to be lost if there are concurrent writes. Currently the writes are all guarded by the same mutex so the previous approach was not actually problematic, but this is safer if e.g. in the future there are separate locks for each cached array.

This isn't the easiest to reproduce (e.g. Storm will not encounter this, since it computes the skinning transforms during the single-threaded sprim sync), but I had a customer file that reproduced this very reliably in the Houdini GL delegate with a 32-core machine, and verified that this fix resolved the issue.

Fixes Issue(s)

  • I have verified that all unit tests pass with the proposed changes
  • I have submitted a signed Contributor License Agreement

- In methods like _ComputeJointWorldInverseBindTransforms(), check the
  compute flag again after acquiring the lock to avoid potentially
  recomputing the result again if multiple threads were waiting on the
  mutex. Although the computed result would not change, it is
  not safe to call mutable member functions of the VtArray (which can
  cause a copy-on-write detach) while other threads may be in the middle
  of making a copy of it.

- Prefer using operator|= to atomically set the flag rather than
  doing a read -> bitwise OR -> atomic store sequence which could cause
  flags to be lost if there are concurrent writes.
  Currently the writes are all guarded by the same mutex so the previous
  approach was not problematic, but the new approach is safer if e.g. in the
  future there are separate locks for each cached array.

Bug: PixarAnimationStudios#1742
@sunyab
Copy link
Contributor

sunyab commented Mar 31, 2023

Filed as internal issue #USD-8176

@pixar-oss pixar-oss merged commit ab33461 into PixarAnimationStudios:dev Jul 3, 2023
pixar-oss added a commit that referenced this pull request Jul 3, 2023
Fix thread safety issues in UsdSkel_SkelDefinition.

(Internal change: 2282798)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants