Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Define a general-purpose cleanup method for CREvent #849

Merged
merged 2 commits into from
Dec 29, 2022

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Dec 27, 2022

Why are these changes needed?

Previously, we did not test ray-cluster.external-redis.yaml in test_sample_raycluster_yamls.py because it installs multiple Kubernetes resources and cannot clean up by clean_up(). This PR defines a general-purpose cleanup method to remove related Kubernetes resources.

Related issue number

Closes #772

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests (KubeRay CI: "Sample YAML Config Test")
    • Manual tests
    • This PR is not tested :(
RAY_IMAGE=rayproject/ray:2.1.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_raycluster_yamls.py

Screen Shot 2022-12-27 at 8 25 45 AM

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. For my understanding, what's the reason for running the test manually if it's also running in CI? Is it just to be extra sure, or is there some test that can only be run manually currently?

Also, which part is still WIP? (Is it the part for Ray Job?)

@kevin85421 kevin85421 changed the title (WIP) [Feature] Define a general-purpose cleanup method for CREvent [Feature] Define a general-purpose cleanup method for CREvent Dec 27, 2022
@kevin85421 kevin85421 marked this pull request as ready for review December 27, 2022 23:35
@kevin85421
Copy link
Member Author

kevin85421 commented Dec 27, 2022

Looks good. For my understanding, what's the reason for running the test manually if it's also running in CI? Is it just to be extra sure, or is there some test that can only be run manually currently?

Also, which part is still WIP? (Is it the part for Ray Job?)

Yes, some tests can only be run manually currently due to the CPU limitation of GitHub Actions ("a standard Linux runner has 2-core CPU (x86_64), 7GB of RAM, and 14GB of SSD space."). That's why we need to integrate with Ray CI's Buildkite infra (#695) in the future. See #678 (comment) for more details.

There are 10 sample RayCluster YAML files in ray-operator/config/samples.

ray-cluster.autoscaler.large.yaml
ray-cluster.autoscaler.yaml
ray-cluster.complete.large.yaml
ray-cluster.complete.yaml
ray-cluster.external-redis.yaml
ray-cluster.getting-started.yaml
ray-cluster.heterogeneous.yaml
ray-cluster.ingress.yaml
ray-cluster.mini.yaml
ray-cluster.separate-ingress.yaml

However, due to the CPU constraint, only four of them (after this PR) can be run on KubeRay CI (GitHub Actions). See test_sample_raycluster_yamls.py for more details.

# The free plan of GitHub Actions (i.e. KubeRay CI) only supports 2-core CPU runners. Most
# sample YAMLs cannot schedule all pods on Kubernetes nodes due to insufficient CPUs. We
# decided to just run some tests on KubeRay CI and run all tests in the Ray CI.
# See https://github.com/ray-project/kuberay/issues/695.
GITHUB_ACTIONS = os.getenv("GITHUB_ACTIONS", default="False").lower() == "true"
github_action_tests = {
    "ray-cluster.getting-started.yaml",
    "ray-cluster.ingress.yaml",
    "ray-cluster.mini.yaml",
    "ray-cluster.external-redis.yaml"
}

If we run the following command locally, we can test 8 sample RayCluster YAML files. ray-cluster.autoscaler.large.yaml and ray-cluster.complete.large.yaml require too many computing resources, and thus we cannot test them locally.

RAY_IMAGE=rayproject/ray:2.1.0 OPERATOR_IMAGE=kuberay/operator:nightly python3 tests/test_sample_raycluster_yamls.py

In addition, currently, only four RayCluster sample YAML files are tested in KubeRay CI. There is no test for RayService (We can test it locally by test_sample_rayservice_yamls.py) and RayJob now.

@DmitriGekhtman DmitriGekhtman merged commit 633ff63 into ray-project:master Dec 29, 2022
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…oject#849)

Previously, we did not test ray-cluster.external-redis.yaml in test_sample_raycluster_yamls.py because it installs multiple Kubernetes resources and cannot clean up by clean_up(). This PR defines a general-purpose cleanup method to remove related Kubernetes resources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Define a general-purpose cleanup method for CREvent
3 participants