-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayJob] [Doc] Add real-world Ray Job use case tutorial for KubeRay #1361
[RayJob] [Doc] Add real-world Ray Job use case tutorial for KubeRay #1361
Conversation
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 999.41 MiB/12.83 GiB object_store_memory: 0%| | 0/200 [00:05<?, ?it/s] | ||
Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 999.41 MiB/12.83 GiB object_store_memory: 0%| | 1/200 [00:05<17:04, 5.15s/it] | ||
Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 1008.68 MiB/12.83 GiB object_store_memory: 0%| | 1/200 [00:05<17:04, 5.15s/it] | ||
Running: 61.0/64.0 CPU, 4.0/4.0 GPU, 1008.68 MiB/12.83 GiB object_store_memory: 100%|██████████| 1/1 [00:05<00:00, 5.15s/it] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This output is a little suspicious.
Label: tench, Tinca tinca | ||
<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546AE430> | ||
Label: tench, Tinca tinca | ||
<PIL.Image.Image image mode=RGB size=500x375 at 0x7B37546CF7F0> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also suspicious, why are there only two distinct memory locations across the five lines?
…batch-inference-example Signed-off-by: Archit Kulkarni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can remove the doc from this PR and transfer it to the Ray repo. This PR should only contain the YAML file. In addition, would you mind renaming the YAML file to ray-job.batch-inference.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will review the doc in the Ray repo.
|
||
|
||
######################Ray code sample################################# | ||
# this sample is from https://docs.ray.io/en/latest/cluster/job-submission.html#quick-start-example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find this example in https://docs.ray.io/en/latest/cluster/job-submission.html#quick-start-example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to remove these comments copied from the original sample RayJob YAML. It's from https://docs.ray.io/en/latest/data/examples/huggingface_vit_batch_prediction.html
name: dashboard | ||
- containerPort: 10001 | ||
name: client | ||
resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource config is pretty weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update it to make limits=requests
name: rayjob-sample | ||
spec: | ||
entrypoint: python /home/ray/samples/sample_code.py | ||
# shutdownAfterJobFinishes specifies whether the RayCluster should be deleted after the RayJob finishes. Default is false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest eliminating extraneous comments to minimize distractions. If these fields are not used / mentioned in the doc, it would be better to remove it.
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
All comments have been addressed, @kevin85421 please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Wait for CI passes.
…ay-project#1361) Adds a sample YAML file for RayJob batch inference --------- Signed-off-by: Archit Kulkarni <[email protected]>
Why are these changes needed?
Adds a tutorial for running a GPU workload using the
RayJob
custom resource. We use the Ray batch inference tutorial.We're moving the docs to the Ray repo shortly, but the PR which moves the existing docs hasn't been merged yet. We can review this new doc in this PR and then I'll make a separate PR to copy it to the Ray repo.
Related issue number
Checks