-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise an Exception when a task, actor, or placement group is permanently infeasible #18835
Comments
@ericl do you also plan to mark "placement groups that are already deleted" as infeasible? |
Yes we should do that too.
…On Mon, Sep 27, 2021, 5:49 AM SangBin Cho ***@***.***> wrote:
pg = ray.placement_group([{"GPU": 999}])
ray.get(pg.ready())
# -> raises UnschedulableError("The cluster configuration cannot fulfill [{"GPU": 999}].")
@ericl <https://github.com/ericl> do you also plan to mark "placement
groups that are already deleted" as infeasible?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18835 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSRAXHXKVNQPCMSA6JDUEBR6RANCNFSM5ESQDWUQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Btw, when do we plan to do this task? |
@rkooo567 you're currently working on a design for this right? cc @richardliaw |
Yes. I've been putting it off a little bit (focus on other tasks first) because of the feature freeze. But I am planning to have a concrete proposal by the end of the sprint and work on it next sprint |
Hi @rkooo567 wondering if there is an update on this ticket? There is some feedback about insufficient resource messages can be confusing if autoscaler is enabled. |
I think this is blocked on pending work from @wuisawesome refactoring the autoscaler interfaces. |
After #18724, we should be able to raise exceptions when a task, actor, or placement group becomes permanently infeasible. To do this, the autoscaler can periodically publish a list of "permanently infeasible" resource demands to the GCS via an RPC, and this list can be distributed across the cluster in resource poll requests from the GCS.
The errors raised would be as follows:
PRD doc: https://docs.google.com/document/d/1OT6m4xQDN8UtsBgnAMpX6nhXpNAfdeHJVve-iGhw1WI/edit
cc @edoakes @scv119 @richardliaw @stephanie-wang @rkooo567
The text was updated successfully, but these errors were encountered: