-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones #678
[Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones #678
Conversation
Will lgtm once you get the new test to pass consistently. |
Co-authored-by: Dmitri Gekhtman <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]>
@DmitriGekhtman There are 4 tests hit the CPU constraint ("a standard Linux runner has 2-core CPU (x86_64), 7GB of RAM, and 14GB of SSD space.") of GitHub Actions as we discussed in the Design: KubeRay E2E Configuration tests. I used
|
Ok, it seems that to automate test execution, we will have to run these tests in the Ray CI. Here's a proposed sequence of actions. For this PR, make sure the tests are passing manually when you run them. You can remove the build step. After merging this PR, the main priority is manual release testing: The Ray 2.1.0 release is in progress and a more-or-less final rayproject/ray:2.1.0 image is available. Prior to the KubeRay 0.4.0 release, run the tests manually again with a KubeRay 0.4.0 candidate and Ray 2.1.0. The next priority is establishing automated pipelines for running these tests in the Ray CI. It's important to track this and do it eventually, but it's fine if we don't put a deadline on it right now. |
If there's a way to fit some subset of the test into CI, perhaps with modifications to the configs, that would also be good. |
This PR runs 3 tests on KubeRay CI, and I will open issue #695 to track the progress of running tests on Ray CI. Although each standard Linux runner has 2 cores, I always failed to schedule Pods when CPU usage is higher than 800m (0.8 CPU). Maybe there are some CPU fragmentation problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
raycluster.complete.yaml and raycluster.autoscaler.yaml are important ones to test in the Ray CI.
Thanks, just wondering if this would've caught the bug fixed by #501? |
Probably not in its current form, but it wouldn't be too hard to extend the framework to validate log volume mounts. |
…e ones (ray-project#678) Use ray-project#605 to test sample RayCluster YAMLs. Signed-off-by: Kai-Hsun Chen <[email protected]>
Why are these changes needed?
Use #605 to test sample RayCluster YAMLs. This PR found:
Related issue number
Closes #642
Checks