Split head memory and cpu requests/limits #579

Bobbins228 · 2024-07-02T09:38:35Z

Issue link

Closes: RHOAIENG-9259

What changes have been made

Split the head cpu and memory resources to requests/limits similar to update SDK args #547
Added depreciation warnings to the old vars head_cpus and head_memory
Updated head/worker_extended_resource_request to include string values due to failing get_cluster method
Updated notebook WF tests to reflect new parameters
Updated existing e2e tests with new Parameters
Added documentation for depreciating variables

Verification steps

Setup

Notebook server ODH/RHOAI/Local

Clone this repository with git clone https://github.com/project-codeflare/codeflare-sdk.git
Checkout this PR's branch
Run poetry build - install if needed (pip install poetry)
Run pip install --force-reinstall dist/codeflare_sdk-0.0.0.dev0-py3-none-any.whl
Restart your notebook kernel

Testing

Testing the depreciating args `head_cpus` and `head_memory`

Follow through the basic Ray demo. Set the head_cpus and head_memory parameters to a value of your choosing.
You should get a warning that the parameters are being depreciated and to use the new ones.

The head cpu requests and limits should both equate the values you entered for the above.

Testing the new requests/limits args

In the ClusterConfiguration add the parameters

head_cpu_requests
head_cpu_limits
head_memory_requests
head_memory_limits

Set them to values of your choosing and the head pod of the Ray Cluster should reflect these values.

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- Testing is not required for this change

src/codeflare_sdk/cluster/config.py

src/codeflare_sdk/cluster/model.py

ChristianZaccaria

There is one thing to note, perhaps unrelated to this PR, and is that a user can basically input ANY value in the ClusterConfiguration parameters.

I.e., I can set to head_cpu_requests=True, including head_gpus=True, and that is reflected in the yaml as a bool. I believe this is not the expected behaviour. - Note that this was tested on KinD as my OpenShift cluster isn't working at the moment.

Bobbins228 · 2024-07-03T09:39:06Z

@ChristianZaccaria
This is not expected behaviour at all :(
I can have a look at adding some validation to ensure that the head/worker requests/limits are of the correct type.
Good catch!

ChristianZaccaria · 2024-07-03T09:41:44Z

@Bobbins228 I couldn't get further, but I suppose maybe cluster.up() will already capture that and throw an error for using the wrong datatypes. However, you're right, there seems to be no validation when creating the yaml file.

Bobbins228 · 2024-07-03T10:57:52Z

@ChristianZaccaria This is insane! It seems you can pretty much set any of the variables to whatever type you like.
I will create a Jira for fixing the validation on all ClusterConfiguration parameters.

Bobbins228 · 2024-07-09T16:43:58Z

Applied do not merge label until RHOAIENG-9259 is a priority again.

KPostOffice

This looks good to me, just some docs changes

docs/cluster-configuration.md

Signed-off-by: Bobbins228 <[email protected]>

Bobbins228 · 2024-09-19T12:48:41Z

/retest

KPostOffice

/lgtm

openshift-ci · 2024-09-19T13:01:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KPostOffice

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [KPostOffice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Bobbins228 requested review from varshaprasad96 and ChristianZaccaria July 2, 2024 09:38

openshift-ci bot requested a review from astefanutti July 2, 2024 09:38

Bobbins228 mentioned this pull request Jul 2, 2024

update SDK args #547

Merged

4 tasks

Bobbins228 force-pushed the split-head-resources branch from 96e3a8e to 7ccd52d Compare July 2, 2024 12:49

astefanutti reviewed Jul 2, 2024

View reviewed changes

src/codeflare_sdk/cluster/config.py Outdated Show resolved Hide resolved

ChristianZaccaria reviewed Jul 3, 2024

View reviewed changes

src/codeflare_sdk/cluster/model.py Outdated Show resolved Hide resolved

ChristianZaccaria reviewed Jul 3, 2024

View reviewed changes

Bobbins228 force-pushed the split-head-resources branch from 7ccd52d to 49df838 Compare July 3, 2024 13:17

Bobbins228 added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2024

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2024

Bobbins228 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2024

Bobbins228 force-pushed the split-head-resources branch from 49df838 to 2db9016 Compare September 9, 2024 14:08

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2024

Bobbins228 force-pushed the split-head-resources branch from 2db9016 to 38b9022 Compare September 10, 2024 13:48

Bobbins228 force-pushed the split-head-resources branch from 38b9022 to ce100ff Compare September 19, 2024 08:35

KPostOffice requested changes Sep 19, 2024

View reviewed changes

docs/cluster-configuration.md Outdated Show resolved Hide resolved

docs/cluster-configuration.md Outdated Show resolved Hide resolved

docs/cluster-configuration.md Outdated Show resolved Hide resolved

openshift-ci bot assigned KPostOffice Sep 19, 2024

Bobbins228 force-pushed the split-head-resources branch from ce100ff to 0d59774 Compare September 19, 2024 12:37

Bobbins228 requested a review from KPostOffice September 19, 2024 12:37

Bobbins228 added 4 commits September 19, 2024 13:37

feat: split head resources for limits and requests

d582dba

Signed-off-by: Bobbins228 <[email protected]>

test: update unit and e2e tests with split head resources

6826915

Signed-off-by: Bobbins228 <[email protected]>

docs: update notebooks with split head resources

ae15cf6

Signed-off-by: Bobbins228 <[email protected]>

docs: update documentation to include depreciating variables

c740864

Signed-off-by: Bobbins228 <[email protected]>

Bobbins228 force-pushed the split-head-resources branch from 0d59774 to c740864 Compare September 19, 2024 12:38

KPostOffice approved these changes Sep 19, 2024

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 19, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 19, 2024

openshift-merge-bot bot merged commit 1235fc8 into project-codeflare:main Sep 19, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split head memory and cpu requests/limits #579

Split head memory and cpu requests/limits #579

Bobbins228 commented Jul 2, 2024 •

edited

Loading

ChristianZaccaria left a comment •

edited

Loading

Bobbins228 commented Jul 3, 2024

ChristianZaccaria commented Jul 3, 2024

Bobbins228 commented Jul 3, 2024

Bobbins228 commented Jul 9, 2024

KPostOffice left a comment

Bobbins228 commented Sep 19, 2024

KPostOffice left a comment

openshift-ci bot commented Sep 19, 2024

Split head memory and cpu requests/limits #579

Split head memory and cpu requests/limits #579

Conversation

Bobbins228 commented Jul 2, 2024 • edited Loading

Issue link

What changes have been made

Verification steps

Setup

Notebook server ODH/RHOAI/Local

Testing

Testing the depreciating args head_cpus and head_memory

Testing the new requests/limits args

Checks

ChristianZaccaria left a comment • edited Loading

Choose a reason for hiding this comment

Bobbins228 commented Jul 3, 2024

ChristianZaccaria commented Jul 3, 2024

Bobbins228 commented Jul 3, 2024

Bobbins228 commented Jul 9, 2024

KPostOffice left a comment

Choose a reason for hiding this comment

Bobbins228 commented Sep 19, 2024

KPostOffice left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Sep 19, 2024

Bobbins228 commented Jul 2, 2024 •

edited

Loading

Testing the depreciating args `head_cpus` and `head_memory`

ChristianZaccaria left a comment •

edited

Loading