Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] [KubeRay] Add end-to-end tutorial for real-world RayJob workload (batch inference) #38857

Merged

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Aug 24, 2023

Why are these changes needed?

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

kevin85421 and others added 30 commits August 20, 2023 16:58
…nchmark.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
…nchmark.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
…nchmark.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
…nchmark.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
@angelinalg
Copy link
Contributor

Do you want to add this example to the Examples Gallery? Instructions are at go/example-gallery.

@architkulkarni
Copy link
Contributor Author

Thanks for the review!

Do you want to add this example to the Examples Gallery? Instructions are at go/example-gallery.

@angelinalg I'm not sure how to decide. Should all examples be in the gallery or is there some criteria? Also, would you tag this with code-example or tutorial?

architkulkarni and others added 18 commits August 31, 2023 12:44
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…example.md

Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
@architkulkarni
Copy link
Contributor Author

Test failure tests:test_object_assign_owner_client_mode unrelated

@architkulkarni architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 31, 2023
@architkulkarni architkulkarni merged commit 399da4f into ray-project:master Aug 31, 2023
1 of 2 checks passed
architkulkarni added a commit to architkulkarni/ray that referenced this pull request Aug 31, 2023
…d (batch inference) (ray-project#38857)

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
GeneDer pushed a commit that referenced this pull request Sep 1, 2023
…38857) (#39186)

* [Doc] [KubeRay] Add tutorial for connecting to google cloud storage bucket from GKE RayCluster (#38858)

This PR adds a self contained tutorial for connecting to a google cloud storage bucket. (Mostly self contained, we do link out to the google cloud docs for creating a bucket.)

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>

* [Doc] [KubeRay] Add end-to-end tutorial for real-world RayJob workload (batch inference) (#38857)

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
LeonLuttenberger pushed a commit to jaidisido/ray that referenced this pull request Sep 5, 2023
…d (batch inference) (ray-project#38857)

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
…d (batch inference) (ray-project#38857)

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Jim Thompson <[email protected]>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…d (batch inference) (ray-project#38857)

This PR adds a tutorial for running a batch inference workload on KubeRay using the RayJob CRD.

It also updates the GPU/GKE doc (which is used as a subroutine in this tutorial) to remove the instructions related to taints and tolerations and GPU driver installation, both of which are currently handled automatically by GKE.

---------

Signed-off-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: Victor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants