Proposed list of Metrics to Stabilize #6546

rcoh · 2024-05-08T14:55:52Z

Is your feature request related to a problem? Please describe.
Given the impact, hassle, and perceived "risk" of compiling with tokio_unstable, I'd like to propose we stabilize some of the existing metrics.

Describe the solution you'd like

Runtime::metrics is stabilized. Documentation is added to this method (currently missing)
RuntimeMetrics is stabilized
We stabilize individual metrics on a case by case base.
Blog post or other piece of high quality source, long-form material a la https://tokio.rs/tokio/topics/shutdown explaining best practices for alarming, monitoring, and using the metrics published by Tokio.

Based on existing usage I've identified, propose the following metrics for stabilization. I've selected metrics that could plausibly include an actual alarm threshold.

Proposed Metrics for Stabilization

num_workers: Used for asserting that the runtime is configured as expected
Note: do not stabilize until merge of feat: add task counter pairs #6114, stabilize under num_active_tasks active_tasks_count: Used for ensuring that runtime is behaving as expected (e.g. no accidental spawn leakages). Suggested alarms: high-water mark, 0.
injection_queue_depth: Used for ensuring that the runtime is making forwards progress & not in a pathological state. Note: this metric would be more useful with either a total counter or some concept of epoch/duration. Suggested alarm: high-water mark
worker_local_queue_depth: Similar to injection_queue_depth, would also be more useful with a total insertion count.
worker_total_busy_duration: Can be used to determine overall load of the worker. An high ratio of busy duration to total time suggests that the worker is performing a lot of CPU bound work. Suggested alarm: in combination with total time & poll count, high CPU usage per poll.
worker_poll_count: Can be combined with busy_duration to estimate time per poll.
worker_overflow_count: General health metric for a worker. If rapidly increasing, indicates that a worker is falling behind. Alarms: increasing at high rate.

Proposed longer term work:

I recommend we stabilize queue metrics as-is and add injection_queue_metrics() -> QueueMetrics { ... } for queues in the future.
In usage, I observe multiple people only considering worker metrics for the 0th worker. I would recommend stabilizing an iterator version of these APIs to encourage customers to actually report metrics from all workers, e.g. workers_overflow_count(&self) -> impl Iterator<Item=(usize, usize)>
Creation of a 0.x tokio-runtime-monitor crate that takes an opinionated stats of metrics to report and includes alarms. Perhaps this crate could publish directly to metrics.rs? This crate would compile on stable Tokio.

Appendix: All Metrics

Additional context
#4073

The text was updated successfully, but these errors were encountered:

Darksonn · 2024-05-11T16:50:00Z

Please see #6114, which renames some metrics.

rcoh · 2024-05-13T14:44:52Z

👍🏻 , it renames active_tasks_count to num_active_tasks. I called that out in the ticket above to delay stabilization of that metric until the CR lands

Darksonn · 2024-05-13T14:51:37Z

As a start, do you want to submit a PR that stabilizes just the overall metrics interface and num_workers?

…ilization This PR also introduces a `metrics` feature. Refs: tokio-rs#6546

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. This PR also introduces a `metrics` feature. Refs: tokio-rs#6546

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

dswij · 2024-05-16T09:15:04Z

We'd love to see this stabilized, especially these metrics that are the most important for us:

num_workers
active_tasks_count
worker_total_busy_duration

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

Darksonn · 2024-05-28T19:57:01Z

As of today, the first metric has been stabilized in #6556.

This stabilizes active_task_count. I also updated the metrics integration test to split unstable/vs. stable metrics so that we correctly test stable metrics in all cases. Refs: tokio-rs#6546

Owen-CH-Leung · 2024-10-05T10:11:36Z

Do we have more metrics that are ready to be stabilised ? Right now I think only num_alive_tasks injection_queue_depth and num_workers are stabilised. I wonder if we can add more. Happy to file a PR

rcoh · 2024-10-10T17:24:42Z

yeah there are a few others in the ticket above that are ready to go. worker_total_busy_duration seems like one that has active customers

Darksonn · 2024-10-11T02:26:01Z

I don't have a good overview of which metrics are important, but I'm happy to stabilize more.

rcoh added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels May 8, 2024

Darksonn added the M-metrics Module: tokio/runtime/metrics label May 8, 2024

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024

metrics: stabilize worker_count to start the process of metric stab…

bce4ac3

…ilization This PR also introduces a `metrics` feature. Refs: tokio-rs#6546

rcoh mentioned this issue May 13, 2024

metrics: stabilize RuntimeMetrics::worker_count #6556

Merged

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024

metrics: stabilize RuntimeMetrics::worker_count

5f2070b

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024

metrics: stabilize RuntimeMetrics::worker_count

68daec0

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

1df9b3d

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

394dc4e

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

59202e0

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

2ec8720

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

c437716

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

6af9960

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 15, 2024

metrics: stabilize RuntimeMetrics::worker_count

c0d906f

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 15, 2024

metrics: stabilize RuntimeMetrics::worker_count

519cd54

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 16, 2024

metrics: stabilize RuntimeMetrics::worker_count

19c54bb

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 16, 2024

metrics: stabilize RuntimeMetrics::worker_count

78890bb

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 17, 2024

metrics: stabilize RuntimeMetrics::worker_count

6f1a593

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh mentioned this issue Jun 6, 2024

metrics: stabilize num_alive_tasks #6619

Merged

mox692 mentioned this issue Jun 24, 2024

Expose All Metrics in tokio::runtime::RuntimeMetrics? tokio-rs/tokio-metrics#61

Open

Owen-CH-Leung mentioned this issue Sep 21, 2024

metrics: Stabilize injection_queue_depth API #6854

Merged

Owen-CH-Leung mentioned this issue Oct 11, 2024

stabilize worker_total_busy_duration #6899

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed list of Metrics to Stabilize #6546

Proposed list of Metrics to Stabilize #6546

rcoh commented May 8, 2024 •

edited by mox692

Loading

Darksonn commented May 11, 2024

rcoh commented May 13, 2024

Darksonn commented May 13, 2024

dswij commented May 16, 2024

Darksonn commented May 28, 2024

Owen-CH-Leung commented Oct 5, 2024

rcoh commented Oct 10, 2024

Darksonn commented Oct 11, 2024

Proposed list of Metrics to Stabilize #6546

Proposed list of Metrics to Stabilize #6546

Comments

rcoh commented May 8, 2024 • edited by mox692 Loading

Proposed Metrics for Stabilization

Proposed longer term work:

Appendix: All Metrics

Darksonn commented May 11, 2024

rcoh commented May 13, 2024

Darksonn commented May 13, 2024

dswij commented May 16, 2024

Darksonn commented May 28, 2024

Owen-CH-Leung commented Oct 5, 2024

rcoh commented Oct 10, 2024

Darksonn commented Oct 11, 2024

rcoh commented May 8, 2024 •

edited by mox692

Loading