-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ray creation #751
Refactor ray creation #751
Conversation
c18c7bc
to
b64153b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #751 +/- ##
==========================================
- Coverage 94.12% 92.91% -1.21%
==========================================
Files 36 36
Lines 2417 2400 -17
==========================================
- Hits 2275 2230 -45
- Misses 142 170 +28 ☔ View full report in Codecov by Sentry. |
4564e31
to
202d75e
Compare
202d75e
to
8701047
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great PR, awesome work Mark! :)
I left some minor nitpicks. I'll now give this a run in a cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed when comparing the YAMLs between this PR and main
branch, on this PR, the RayCluster yaml generated contains explicitly:
imagePullPolicy: Always
In main
, this is unset. By default, if the image tag is :latest
the imagePullPolicy is set to Always
, otherwise, the default is ifNotPresent
. ifNotPresent
may be preferred here to only pull the image if it's not cached or doesn't already exist on the node.
8701047
to
6ce8630
Compare
6ce8630
to
f32fd75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- YAMLs look as expected when comparing with
main
and this PR.
I tested the following scenarios:- LocalQueue set, no image (defaults to py3.9 image on OpenShift).
- LocalQueue not set, no image (defaults to py3.9 image on OpenShift).
- Python3.11 environment: the image changes to Ray image for py3.11.
- Testing most parameters: Set
envs
, AppWrapper true, no LQ, set custom image, and set gpus. - Testing most parameters: Set
envs
, AppWrapper true, set LQ, set custom image, and set gpus.
- AppWrappers and RayClusters work as expected.
get_cluster()
works well!
/lgtm thanks! Great work!
f32fd75
to
74d3f38
Compare
f1ed63c
to
9595458
Compare
9595458
to
5f5c2ab
Compare
5f5c2ab
to
30bf971
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ChristianZaccaria, KPostOffice The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6ec44c5
into
project-codeflare:main
Issue link
Closes: RHOAIENG-10385 and RHOAIENG-8846
What changes have been made
ray_version
a variable for potential future automationcreate_resource
get_cluster
method to generate a new ClusterConfiguration with just thename
andnamespace
of the cluster and retrieved yaml.env
config param now actually worksVerification steps
Setup
Notebook server ODH/RHOAI/Local
git clone https://github.com/project-codeflare/codeflare-sdk.git
poetry build
- install if needed (pip install poetry
)pip install --force-reinstall dist/codeflare_sdk-0.0.0.dev0-py3-none-any.whl
Testing
All
ClusterConfiguration
parameters must be tested with the new cluster creation method.Keep a special eye out for the following as they were the most complex to implement:
Recommendation
Have 2 separate virtual envs 1 with main SDK and 1 with this PR's SDK and compare created Ray Clusters.
Small things like blank image pull secrets and some example metadata labels are removed from every generated ray cluster so they wont be an exact match. The important thing to look out for is if the configurations match 👍
Automated Notebook testing should cover the functionality changed but I still suggest all parameters should be human verified.
Test the new and improved
get_cluster()
function.NOTE: You can compare the original & retrieved clusters by setting
write_to_file=True
onClusterConfiguration
andget_cluster()
NOTE 2:
get_cluster()
will also retrieve the mtls/oauth containers as well. This has no impact on the ability to create the cluster after deleting it throughget_cluster()
->cluster.down()
->cluster.up()
cluster = get_cluster(cluster_name=<name>, namespace=<namespace>, write_to_file=True)
cluster.
methodscluster.down()
thencluster.up()
Checks