Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] fix PinExistingReturnObject segfault by passing owner_address #46973

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

hongchaodeng
Copy link
Member

@hongchaodeng hongchaodeng commented Aug 5, 2024

Why are these changes needed?

When worker crash-restarts, there is a chance that it would call PinExistingReturnObject() to pin existing objects. But PinExistingReturnObject() calls GetThreadContext() which doesn't work for async actor methods.

The owner_address is already there before calling PinExistingReturnObject(). This fix try to avoid the segfault by passing owner_address to PinExistingReturnObject().

Related issue number

fix #46489

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

When worker crash-restarts, there is a chance that it would call
PinExistingReturnObject() to pin existing objects.
But PinExistingReturnObject() calls GetThreadContext() which doesn't work
for async actor methods.
The owner_address is already there before calling PinExistingReturnObject().
This fix try to avoid the segfault by passing owner_address to PinExistingReturnObject().

Signed-off-by: hongchaodeng <[email protected]>
@hongchaodeng hongchaodeng added the go add ONLY when ready to merge, run all tests label Aug 6, 2024
Copy link
Contributor

@rynewang rynewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment on worker_context_.GetCurrentTask() that we can't use it in fiber context?

@hongchaodeng
Copy link
Member Author

can we add a comment on worker_context_.GetCurrentTask() that we can't use it in fiber context?

Let me create another PR for it. I have a couple of ideas to refactor it.

@rynewang rynewang merged commit 7731574 into ray-project:master Aug 6, 2024
6 checks passed
@hongchaodeng hongchaodeng deleted the fix-threadlocal branch August 7, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] Fiber vs thread_local SIGSEGV in Actor async methods
2 participants