Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussions about which metrics we need to stat #4603

Closed
jovany-wang opened this issue Apr 11, 2019 · 5 comments
Closed

Discussions about which metrics we need to stat #4603

jovany-wang opened this issue Apr 11, 2019 · 5 comments
Assignees
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@jovany-wang
Copy link
Contributor

jovany-wang commented Apr 11, 2019

We have integrated opencensus library into raylet and added 2 example metrics: CurrentWorker and TaskReceivedCount.

I think this issue can be used to collect our requirements about metrics. I racked my brains to think of these metrics:

  • RedisLatency: The latency of a redis operator. (finished at Integrate metric items into raylet #4602)
  • CompletedTaskCount: How many completed tasks.
  • TaskElapsedTime: The time that a task elapsed in backend.
  • ObjectCount: How many objects we are holding.

I discussed with @raulchen about TaskElapsedTime offline, and we thought it's unnecessary to add this metric because the change is a bit huge, we should add a start_time for TaskInfo, and this should be implemented by tracing system which can trace the lifecycle of tasks, actor, etc.

So, what other metrics need to be stated?

@jovany-wang
Copy link
Contributor Author

@ericl
Copy link
Contributor

ericl commented Apr 11, 2019

I would probably also add the metrics emitted in debug_state.txt as well, which should cover internal data structure sizes and things like that.

Some metrics on what the raylet is spending time doing would also be useful, i.e., time making scheduling decisions.

@robertnishihara
Copy link
Collaborator

@jovany-wang One thing that would be incredibly useful is to know how long the raylet spends in each event handler? E.g., what is the maximum amount (and the whole distribution ideally) of time it spends responding to a task submission or to a object becoming available.

@jovany-wang jovany-wang reopened this Apr 27, 2019
@jovany-wang
Copy link
Contributor Author

@robertnishihara Thank you for reminding me of this.

@ericl ericl removed the help wanted label Jun 20, 2019
@rkooo567 rkooo567 self-assigned this Oct 12, 2020
@rkooo567 rkooo567 added P1 Issue that should be fixed within a few weeks enhancement Request for new feature and/or capability labels Oct 12, 2020
@rkooo567
Copy link
Contributor

Added new stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants