Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed list of Metrics to Stabilize #6546

Open
2 of 25 tasks
rcoh opened this issue May 8, 2024 · 8 comments
Open
2 of 25 tasks

Proposed list of Metrics to Stabilize #6546

rcoh opened this issue May 8, 2024 · 8 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-metrics Module: tokio/runtime/metrics

Comments

@rcoh
Copy link
Contributor

rcoh commented May 8, 2024

Is your feature request related to a problem? Please describe.
Given the impact, hassle, and perceived "risk" of compiling with tokio_unstable, I'd like to propose we stabilize some of the existing metrics.

Describe the solution you'd like

  1. Runtime::metrics is stabilized. Documentation is added to this method (currently missing)
  2. RuntimeMetrics is stabilized
  3. We stabilize individual metrics on a case by case base.
  4. Blog post or other piece of high quality source, long-form material a la https://tokio.rs/tokio/topics/shutdown explaining best practices for alarming, monitoring, and using the metrics published by Tokio.

Based on existing usage I've identified, propose the following metrics for stabilization. I've selected metrics that could plausibly include an actual alarm threshold.

Proposed Metrics for Stabilization

  • num_workers: Used for asserting that the runtime is configured as expected
  • Note: do not stabilize until merge of feat: add task counter pairs #6114, stabilize under num_active_tasks active_tasks_count: Used for ensuring that runtime is behaving as expected (e.g. no accidental spawn leakages). Suggested alarms: high-water mark, 0.
  • injection_queue_depth: Used for ensuring that the runtime is making forwards progress & not in a pathological state. Note: this metric would be more useful with either a total counter or some concept of epoch/duration. Suggested alarm: high-water mark
  • worker_local_queue_depth: Similar to injection_queue_depth, would also be more useful with a total insertion count.
  • worker_total_busy_duration: Can be used to determine overall load of the worker. An high ratio of busy duration to total time suggests that the worker is performing a lot of CPU bound work. Suggested alarm: in combination with total time & poll count, high CPU usage per poll.
  • worker_poll_count: Can be combined with busy_duration to estimate time per poll.
  • worker_overflow_count: General health metric for a worker. If rapidly increasing, indicates that a worker is falling behind. Alarms: increasing at high rate.

Proposed longer term work:

  • I recommend we stabilize queue metrics as-is and add injection_queue_metrics() -> QueueMetrics { ... } for queues in the future.
  • In usage, I observe multiple people only considering worker metrics for the 0th worker. I would recommend stabilizing an iterator version of these APIs to encourage customers to actually report metrics from all workers, e.g. workers_overflow_count(&self) -> impl Iterator<Item=(usize, usize)>
  • Creation of a 0.x tokio-runtime-monitor crate that takes an opinionated stats of metrics to report and includes alarms. Perhaps this crate could publish directly to metrics.rs? This crate would compile on stable Tokio.

Appendix: All Metrics

Additional context
#4073

@rcoh rcoh added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels May 8, 2024
@Darksonn Darksonn added the M-metrics Module: tokio/runtime/metrics label May 8, 2024
@Darksonn
Copy link
Contributor

Please see #6114, which renames some metrics.

@rcoh
Copy link
Contributor Author

rcoh commented May 13, 2024

👍🏻 , it renames active_tasks_count to num_active_tasks. I called that out in the ticket above to delay stabilization of that metric until the CR lands

@Darksonn
Copy link
Contributor

As a start, do you want to submit a PR that stabilizes just the overall metrics interface and num_workers?

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024
…ilization

This PR also introduces a `metrics` feature.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

This PR also introduces a `metrics` feature.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 15, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 15, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
@dswij
Copy link

dswij commented May 16, 2024

We'd love to see this stabilized, especially these metrics that are the most important for us:

  1. num_workers
  2. active_tasks_count
  3. worker_total_busy_duration

rcoh added a commit to rcoh/tokio that referenced this issue May 16, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 16, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
rcoh added a commit to rcoh/tokio that referenced this issue May 17, 2024
This PR stabilizes a single metric API to start the process of stabilizing metrics.
Future work will continue to stabilize more metrics.

Refs: tokio-rs#6546
@Darksonn
Copy link
Contributor

As of today, the first metric has been stabilized in #6556.

rcoh added a commit to rcoh/tokio that referenced this issue Jun 6, 2024
This stabilizes active_task_count. I also updated the metrics
integration test to split unstable/vs. stable metrics so that we
correctly test stable metrics in all cases.

Refs: tokio-rs#6546
@Owen-CH-Leung
Copy link
Contributor

Do we have more metrics that are ready to be stabilised ? Right now I think only num_alive_tasks injection_queue_depth and num_workers are stabilised. I wonder if we can add more. Happy to file a PR

@rcoh
Copy link
Contributor Author

rcoh commented Oct 10, 2024

yeah there are a few others in the ticket above that are ready to go. worker_total_busy_duration seems like one that has active customers

@Darksonn
Copy link
Contributor

I don't have a good overview of which metrics are important, but I'm happy to stabilize more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-metrics Module: tokio/runtime/metrics
Projects
None yet
Development

No branches or pull requests

4 participants