Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[core] Add metrics for gcs jobs (ray-project#47793)
This PR adds metrics for job states within job manager. In detail, a gauge stats is sent via opencensus exporter, so running ray jobs could be tracked and alerts could be created later on. Fault tolerance is not considered, according to [doc](https://docs.ray.io/en/latest/ray-core/fault_tolerance/gcs.html), state is re-constructed at restart. On testing, the best way is to observe via opencensus backend (i.e. google monitoring dashboard), but not easy for open-source contributors; or to have a mock / fake exporter implementation, which I don't find in the code base. Signed-off-by: dentiny <[email protected]> Co-authored-by: Ruiyang Wang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
- Loading branch information