-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a document for profiling #1299
Conversation
Signed-off-by: Yicheng-Lu-llll <[email protected]>
|
||
@ray.remote | ||
def long_running_task(): | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this to the doc?
cc @rkooo567 @scottsun94 would you mind reviewing this PR? Thanks! |
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Can I preview the doc somewhere? Or I have to build it locally to see it? |
Hi @scottsun94, you can preview the document here: https://github.com/ray-project/kuberay/blob/4abf746a7f80f1075b1ffba30751eda9c2a03089/docs/guidance/profiling.md. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Left my comments
docs/guidance/profiling.md
Outdated
@@ -0,0 +1,65 @@ | |||
# Profiling with KubeRay | |||
|
|||
[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. | |
## Stack trace and CPU profiling | |
[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. |
Can we add a h2 title here so that:
- make it clear that profiling =! pyspy
- we can add more profiling-related info down the road than just py-spy, such as memory profiling, gpu profiling, etc.
docs/guidance/profiling.md
Outdated
|
||
This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard. | ||
|
||
### **Theory** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### **Theory** | |
### **Prerequisite** |
docs/guidance/profiling.md
Outdated
This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard. | ||
|
||
### **Theory** | ||
py-spy requires the `SYS_PTRACE` capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the `template.spec.containers` for both the head and workers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
py-spy requires the `SYS_PTRACE` capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the `template.spec.containers` for both the head and workers. | |
py-spy requires the `SYS_PTRACE` capability to read process memory. However, Kubernetes omits this capability by default. To enable profiling, add the following to the `template.spec.containers` for both the head and worker pods. |
docs/guidance/profiling.md
Outdated
**Notes:** | ||
- If you're running your own examples and encounter the error `Failed to write flamegraph: I/O error: No stack counts found` when viewing CPU Flame Graph, it might be due to the process being idle. Notably, using the `sleep` function can lead to this state. In such situations, py-spy filters out the idle stack traces. Refer to this [issue](https://github.com/benfred/py-spy/issues/321#issuecomment-731848950) for more information. | ||
|
||
6. **Profile using the Ray dashboard**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6. **Profile using the Ray dashboard**: | |
6. **Profile using Ray Dashboard**: |
docs/guidance/profiling.md
Outdated
|
||
[py-spy](https://github.com/benfred/py-spy/tree/master) is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. | ||
|
||
This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray dashboard. | |
This section describes how to configure RayCluster YAML file to enable py-spy and see Stack Trace and CPU Flame Graph via Ray Dashboard. |
docs/guidance/profiling.md
Outdated
|
||
# (Head Pod) Run a sample job in the Pod | ||
# `long_running_task` includes a `while True` loop to ensure the task remains actively running indefinitely. | ||
# This allows you ample time to view the Stack Trace and CPU Flame Graph via the Ray dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This allows you ample time to view the Stack Trace and CPU Flame Graph via the Ray dashboard. | |
# This allows you ample time to view the Stack Trace and CPU Flame Graph via Ray Dashboard. |
ports: | ||
- containerPort: 6379 | ||
name: gcs-server | ||
- containerPort: 8265 # Ray dashboard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- containerPort: 8265 # Ray dashboard | |
- containerPort: 8265 # Ray Dashboard |
cpu: 500m | ||
memory: 2Gi | ||
# `py-spy` is a sampling profiler that requires `SYS_PTRACE` to read process memory effectively. | ||
# Once enabled, you can profile Ray worker processes through the Ray dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Once enabled, you can profile Ray worker processes through the Ray dashboard. | |
# Once enabled, you can profile Ray worker processes through Ray Dashboard. |
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Hi @scottsun94, thank you for the review! I've updated the PR following your suggestions. I have also reran the instructions from the doc, everything works as expected. |
One more question: do we plan to expose this page in KubeRay doc https://ray-project.github.io/kuberay/ ? |
Signed-off-by: Yicheng-Lu-llll <[email protected]>
Hi @scottsun94, I have added profiling.md to https://ray-project.github.io/kuberay/. You can preview it by following the instructions here: https://github.com/ray-project/kuberay/blob/master/docs/development/development.md#deploying-documentation-locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Yicheng-Lu-llll and @scottsun94!
Add a document for profiling
Add a document for profiling
Why are these changes needed?
See #982. Users have difficulty getting
py-spy
to work using Kuberay. This PR introduces documentation to address the issue.Related issue number
Closes #982
Checks