Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May hang when submitting actor task #17700

Closed
2 tasks
keyile opened this issue Aug 10, 2021 · 4 comments
Closed
2 tasks

May hang when submitting actor task #17700

keyile opened this issue Aug 10, 2021 · 4 comments
Labels
bug Something that is supposed to be working; but isn't stale The issue is stale. It will be closed within 7 days unless there are further conversation triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@keyile
Copy link

keyile commented Aug 10, 2021

What is the problem?

It could be a deadlock when the main thread (holding GIL) is waiting for the mutex in core worker, and the heartbeat thread (holding the mutex) is waiting for GIL. We provide the corresponding stacks below to make it more clear:

image
image

The PR #12803 may be relative with this issue.

Ray version and other system information (Python version, TensorFlow version, OS):
Ray: 1.5.0
Python: 3.7.9
OS: Mac OS 11.4

Reproduction (REQUIRED)

The reproduction is very difficult so no code can be provided here.

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@keyile keyile added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 10, 2021
@kfstorm
Copy link
Member

kfstorm commented Aug 17, 2021

It seems that this issue has been addressed by #17396. cc @ericl @scv119 @rkooo567 to confirm.

@kfstorm
Copy link
Member

kfstorm commented Aug 17, 2021

Also cc @fyrestone

@stale
Copy link

stale bot commented Dec 15, 2021

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Dec 15, 2021
@stale
Copy link

stale bot commented Dec 29, 2021

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't stale The issue is stale. It will be closed within 7 days unless there are further conversation triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

2 participants