-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify function calls and add option for custom resources #531
simplify function calls and add option for custom resources #531
Conversation
c1ee4a1
to
52ffe63
Compare
52ffe63
to
2173f28
Compare
2173f28
to
fb85bf8
Compare
fb85bf8
to
ea21870
Compare
ea21870
to
069c43d
Compare
49bb0f1
to
f071d20
Compare
0ced913
to
8cc3f20
Compare
a8d3d77
to
536f0c8
Compare
536f0c8
to
e92f5b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work surrounding num_worker_gpus
and num_head_gpus
should be removed in favour of head_extended_resource_requests
and worker_extended_resource_requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and Job completed successfully with gpus.
Something to note, for some reason the ray dashboard believes we are using Tesla T4 GPUs. Does anyone know why?
I also noticed that while the pods are initialising, the status is SUSPENDED, eventhough pods are actually coming up. Moving from SUSPENDED->READY. Should I open an issue for both of these?
0689008
to
80b1aad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
80b1aad
to
80e4a49
Compare
80e4a49
to
b104e7c
Compare
Signed-off-by: Kevin <[email protected]>
b104e7c
to
7a3d5f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Bobbins228 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Requested changes resolved
5ce0b2c
into
project-codeflare:main
Issue link
What changes have been made
I changed some of the huge function calls that were getting out of hand to take a reference to the cluster object. It might be worthwhile to make these into class methods at a future date
I have also added the ability for the user to specify custom resource requirements for Ray
Verification steps
Checks