Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RayService: Dev RayService CR and Controller logic #287

Merged
merged 21 commits into from
Jun 9, 2022

Conversation

brucez-anyscale
Copy link
Contributor

@brucez-anyscale brucez-anyscale commented Jun 4, 2022

Why are these changes needed?

This is the implementation of RayService and its controller.
Includes:

  1. Maintain Raycluster
  2. Connect Ray Dashboard
  3. Maintain Serve Deployment
  4. Health check and restart

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 4, 2022

Great. I will have a check tomorrow

@brucez-anyscale
Copy link
Contributor Author

Great. I will have a check tomorrow

Thanks! I would like to collect review comments first and then at the same time add tests.

@simon-mo
Copy link
Collaborator

simon-mo commented Jun 6, 2022

@pcmoritz @sriram-anyscale @yiranwang52 we would love to get your review here as well! Thanks

@simon-mo
Copy link
Collaborator

simon-mo commented Jun 6, 2022

@shrekris-anyscale please review the Serve schema portion

ray-operator/Makefile Show resolved Hide resolved
ray-operator/Makefile Outdated Show resolved Hide resolved
num_cpus: 0.1
runtime_env:
py_modules:
- "https://github.com/shrekris-anyscale/test_deploy_group/archive/HEAD.zip"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrekris-anyscale can you please transfer the repo to ray-project?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or make a copy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a new repo and transferred the code. @brucez-anyscale Could you try using this repo instead? The py_modules link should be https://github.com/ray-project/test_deploy_group/archive/67971777e225600720f91f618cdfe71fc47f60ee.zip (the hash pins the code to the current commit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about test_module ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone help me understand how these zip are generated and why there're two zips needed? does it support S3 or GCS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrekris-anyscale could you help to improve the runtime_env config to make them general enough that we can merge it in the kuberay repo? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jeffwan We're using two zip files in this case for testing purposes. They're essentially two different codebases. See this documentation for how these zip files are generated. You can also use S3 and GCS, but GitHub is convenient when writing tests because people can pull or push new code and update the URL more easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I synced with @brucez-anyscale offline. The runtime_env can use S3 and Google Storage URLs, but GitHub is more convenient for testing.

ray-operator/controllers/ray/rayservice_controller.go Outdated Show resolved Hide resolved
ray-operator/controllers/ray/rayservice_controller.go Outdated Show resolved Hide resolved
@DmitriGekhtman
Copy link
Collaborator

A couple of spelling nits, but lgtm! I think we can add some tests now.

@brucez-anyscale brucez-anyscale marked this pull request as ready for review June 9, 2022 17:34
@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented Jun 9, 2022

Thanks for adding the unit tests!
Do we have a longer term plan for integration / e2e tests? It will be important to test things like health check failures and restarts.

@brucez-anyscale
Copy link
Contributor Author

Thanks for adding the unit tests! Do we have a longer term plan for integration / e2e tests? It will be important to test things like health check failures and restarts.

It will be followup prs.

Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!!

@brucez-anyscale brucez-anyscale merged commit 20c5bfe into master Jun 9, 2022
@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 10, 2022

Overall looks good. Please cut follow up issues and we can track the progress. @brucez-anyscale excellent work!

@Jeffwan Jeffwan deleted the brucez/rayService branch June 10, 2022 02:29
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
* Implement RayService CRD and controller

* Update config and controller

* update

* update

* refactor

* goimports

* goimports

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* add unit tests

* fix ut

* Update yaml

* update ut

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants