-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] removing placement groups sometimes throws a SystemExit
error
#13487
Comments
Set P1 until identifying the root cause. |
cc @oliverhu Do you have some time to take a look at this? |
a bit hectic recently, is this more important than the "possible unhandled error" one? |
I don't think so (probably similar). If you are busy, it is totally fine! I can take a look at it later. |
❤️ thanks! |
hi, @rkooo567 , is this in process? I fount this issue still exist, Can I take over this issue? |
Hmm, possibly? I am not 100% sure if it is the same issue. |
I think similar, currently |
This is actually an expected behavior at least right now. We basically kill the actor that are associated with the placement group if the pg is removed. @krfricke what's the expected behavior you'd like to see here? Would you 1. keep the actor alive or 2. raise a different error like PlacementGroupRemovedError? |
Duplicate #10232 |
You can try by deleting the folder
|
What is the problem?
Latest master.
When removing placement groups that are used by an actor the actor sometimes fails with a
SystemExit
error. This occurs after introducing PGs to Ray Tune (#13370).I'm not sure if this is a bug or a usage error. It only comes up sometimes, not all the time.
Gerenally it would be great to be able to disable SystemExit error messages when removing placement groups. The same might be true for deliberately terminating actors.
Reproduction (REQUIRED)
The repro script is non-deterministic. In the last 10 runs, it failed 5 times (and did not throw an error in the other 5 runs).
The repro script contains a much simplified version of the tune training loop.
cc @rkooo567
The text was updated successfully, but these errors were encountered: