-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config/prometheus: add metrics exporter for workers #469
config/prometheus: add metrics exporter for workers #469
Conversation
57b9126
to
afa4849
Compare
@ulfox This is great. BTW, how do you use pod level (ray worker) metrics in your case? We considered to monitor workers in our downstream but feel there're not lots of values. I am trying to learn how you leverage those metrics? /cc @scarlet25151 |
We currently use the worker's metrics for obersvability using grafana panels. We check
For example with the following query
We can detect waiting for resources or plasma memory spikes and then check
Some additional examples of worker metrics we observe
Ratio metrics
Availability metrics
|
* Also update and rename serviceMonitor example
afa4849
to
12df3f7
Compare
@ulfox These are awesome guidances! We export the control plane grafana dashboard here. https://github.com/ray-project/kuberay/tree/master/config/grafana If the one for workers can be open sourced on your side, I think people would love it. |
@Jeffwan I will provide a workers Grafana panel as well! |
* config/prometheus: add metrics exporter for workers * Also update and rename serviceMonitor example * config/prometheus/rules: add custom rules example * update: docs/guidance/observability
Why are these changes needed?
Sample configuration for exporting metrics from ray cluster workers. This works with autoscaling and should cover new workers and remove destroyed worker pods as well
The podMonitor CRD resource works in a similar way that serviceMonitor works but instead of targeting services, it targets pods.
Prometheus example after applying this manifest