Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a document for profiling #1299

Merged
merged 8 commits into from
Aug 15, 2023
Merged

Conversation

Yicheng-Lu-llll
Copy link
Contributor

@Yicheng-Lu-llll Yicheng-Lu-llll commented Aug 7, 2023

Why are these changes needed?

See #982. Users have difficulty getting py-spy to work using Kuberay. This PR introduces documentation to address the issue.

Related issue number

Closes #982

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Yicheng-Lu-llll <[email protected]>

@ray.remote
def long_running_task():
while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this to the doc?

ray-operator/config/samples/ray-cluster.profiling.yaml Outdated Show resolved Hide resolved
@kevin85421
Copy link
Member

cc @rkooo567 @scottsun94 would you mind reviewing this PR? Thanks!

Signed-off-by: Yicheng-Lu-llll <[email protected]>
Signed-off-by: Yicheng-Lu-llll <[email protected]>
@scottsun94
Copy link

Can I preview the doc somewhere? Or I have to build it locally to see it?
Btw, is there a way to see the master doc? Does this https://ray-project.github.io/kuberay/ only show the latest version's doc?

@scottsun94
Copy link

We should add a profiling page (which should cover all kinds of topics related to profiling) in Ray doc too and cross-reference this kubeRay doc or even just duplicate this. cc: @rkooo567

@Yicheng-Lu-llll
Copy link
Contributor Author

Can I preview the doc somewhere? Or I have to build it locally to see it?

Hi @scottsun94, you can preview the document here: https://github.com/ray-project/kuberay/blob/4abf746a7f80f1075b1ffba30751eda9c2a03089/docs/guidance/profiling.md.

Copy link

@scottsun94 scottsun94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. Left my comments

@@ -0,0 +1,65 @@
# Profiling with KubeRay

[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way.
## Stack trace and CPU profiling
[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way.

Can we add a h2 title here so that:

  • make it clear that profiling =! pyspy
  • we can add more profiling-related info down the road than just py-spy, such as memory profiling, gpu profiling, etc.


This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard.

### **Theory**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### **Theory**
### **Prerequisite**

This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard.

### **Theory**
py-spy requires the `SYS_PTRACE` capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the `template.spec.containers` for both the head and workers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
py-spy requires the `SYS_PTRACE` capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the `template.spec.containers` for both the head and workers.
py-spy requires the `SYS_PTRACE` capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the `template.spec.containers` for both the head and worker pods.

**Notes:**
- If you're running your own examples and encounter the error `Failed to write flamegraph: I/O error: No stack counts found` when viewing CPU Flame Graph, it might be due to the process being idle. Notably, using the `sleep` function can lead to this state. In such situations, py-spy filters out the idle stack traces. Refer to this [issue](https://github.com/benfred/py-spy/issues/321#issuecomment-731848950) for more information.

6. **Profile using the Ray dashboard**:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
6. **Profile using the Ray dashboard**:
6. **Profile using Ray Dashboard**:


[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way.

This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard.
Copy link

@scottsun94 scottsun94 Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard.
This section describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray Dashboard.


# (Head Pod) Run a sample job in the Pod
# `long_running_task` includes a `while True` loop to ensure the task remains actively running indefinitely.
# This allows you ample time to view the Stack Trace and CPU Flame Graph via the Ray dashboard.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This allows you ample time to view the Stack Trace and CPU Flame Graph via the Ray dashboard.
# This allows you ample time to view the Stack Trace and CPU Flame Graph via Ray Dashboard.

ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265 # Ray dashboard

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- containerPort: 8265 # Ray dashboard
- containerPort: 8265 # Ray Dashboard

cpu: 500m
memory: 2Gi
# `py-spy` is a sampling profiler that requires `SYS_PTRACE` to read process memory effectively.
# Once enabled, you can profile Ray worker processes through the Ray dashboard.
Copy link

@scottsun94 scottsun94 Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Once enabled, you can profile Ray worker processes through the Ray dashboard.
# Once enabled, you can profile Ray worker processes through Ray Dashboard.

Yicheng-Lu-llll and others added 4 commits August 15, 2023 05:45
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Signed-off-by: Yicheng-Lu-llll <[email protected]>
@Yicheng-Lu-llll
Copy link
Contributor Author

Hi @scottsun94, thank you for the review! I've updated the PR following your suggestions. I have also reran the instructions from the doc, everything works as expected.

@scottsun94
Copy link

One more question: do we plan to expose this page in KubeRay doc https://ray-project.github.io/kuberay/ ?

@Yicheng-Lu-llll
Copy link
Contributor Author

Hi @scottsun94, I have added profiling.md to https://ray-project.github.io/kuberay/.

You can preview it by following the instructions here: https://github.com/ray-project/kuberay/blob/master/docs/development/development.md#deploying-documentation-locally.

Here is a screenshot from my local setup:
image

Copy link

@scottsun94 scottsun94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin85421 kevin85421 merged commit 87407ac into ray-project:master Aug 15, 2023
18 checks passed
blublinsky pushed a commit to blublinsky/kuberay that referenced this pull request Aug 22, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add a document for profiling
3 participants