Adding prometheus dashboard support #1
+123
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adding an option to periodically scrape the ray metrics actor, and publish to appropriate perspective tables.
I tested this mostly low scale (2-5 second scraping logic, running for a few hours, adding unit test, that kind of thing). It's possible we want this to be more of an "experimental" feature, and/or restrict the tables and metrics we actually care to scrape/publish, but I think this is cool enough for now that we should publish this as an option.
leaving for a few hours, my local cluster cpu/memory was largely the same throughout, and looked like this:
which also seems to reflect closely the default option for our cluster, which looks like this:
(Note this is all running on my local, head node)
With minimal effort, we suddenly get access to all these (and more!) perspective metrics:
and the metrics are pretty expansive,