feat: add task counter pairs #6114

conradludgate · 2023-10-27T17:02:04Z

Motivation

Metrics like active_tasks_count or injection_queue_depth are fast-moving gauges and even taking a snapshot every few seconds doesn't say much about what's going inside Tokio. It would be better to use two counters: one for additions, one for removals

We're hoping to add a prometheus exporter for the tokio metrics information, but a sample rate of 15 seconds will likely miss a lot of task spikes. I could implement some level of eager aggregation, but as the linked comment says, you can still miss some with a sample rate of 500ms.

Solution

In CountedLinkedList, replace the count: usize with a pair of u64s that can only be incremented. One u64 for added items and one for removed items.

Open to bikeshedding on the terminology

Open questions

should the active task API return all 3 values in 1, rather than require 3 separate lock calls?
what other APIs are current gauges and should be counters?

conradludgate · 2023-10-30T09:17:48Z

Gauge like metrics:

num_workers (a constant quality, doesn't count)
num_blocking_threads
num_idle_blocking_threads
injection_queue_depth
worker_local_queue_depth
blocking_queue_depth

`num_blocking_threads`

Can be treated as blocking_threads_created - blocking_threads_released. Would require 2 atomics, unless it is acceptable to make this a u64 which encodes 2 u32s (how many apps will create 4 billion blocking threads?!)

`num_idle_blocking_threads`

Same as above, although likely will need 2 u64 counters. blocking_active_total - blocking_idle_total.

`injection_queue_depth`

injection_pushed - injection_popped. Requires 2 u64 atomic counters.

`worker_local_queue_depth`

Requires no additional counters, we already have head and tail. They are u32 quantities though and will likely overflow, which makes this tricky. I appreciate that adding extra atomics to this path might introduce noticeable latency spike so I am fine with ignoring this one.

`blocking_queue_depth`

Same as the other blocking gauges.

hawkw · 2023-10-30T16:11:15Z

IMO using two counters rather than a gauge is definitely more correct for these metrics, so I'm 👍 on this change.

tokio/src/util/linked_list.rs

tokio/tests/rt_metrics.rs

Darksonn · 2023-11-25T14:27:23Z

Any status update on this?

conradludgate · 2023-11-26T16:57:41Z

I'll try and fix up the flaky tests tomorrow.

Any opinions on the API? Since it's likely that the pair will be accessed together and not separately, doing 2 locks is a bit unfortunate rather than just 1. Probably this should return a tuple pair instead of having 2 functions

Darksonn · 2023-11-27T12:51:36Z

Returning a tuple makes sense to me. You could even define a struct with two fields to give better names than .0 and .1 to the two properties.

tokio/src/runtime/metrics/runtime.rs

Darksonn · 2024-01-30T10:11:23Z

Hi, it looks like the conflicting PR has been merged now. Sorry that it took so long to get back to you after that. Are you still interested in working on this?

conradludgate · 2024-01-30T10:41:50Z

Are you still interested in working on this?

Yes, I will rebase accordingly. Are there any other changes you think should be included?

Darksonn · 2024-01-30T11:19:09Z

Hmm, overall it looks good, but I don't love the naming of CounterPair and CounterPair::len.

conradludgate · 2024-02-12T14:56:04Z

Since the sharded list makes use of atomics, I've moved from added/removed to added/count so that is_empty() only needs 1 atomic access.

Hmm, overall it looks good, but I don't love the naming of CounterPair and CounterPair::len.

I'm tempted to remove it then and we can stick with start_task_count and active_task_count functions.

conradludgate · 2024-02-12T15:04:49Z

also renamed start_tasks to spawned_tasks as it is likely more intuitive.

tokio/tests/rt_metrics.rs

tokio/src/runtime/metrics/runtime.rs

tokio/tests/rt_metrics.rs

Darksonn · 2024-05-03T09:53:33Z

There's a CI failure:

FAIL [   0.386s] tokio::rt_metrics num_active_tasks

--- STDOUT:              tokio::rt_metrics num_active_tasks ---

running 1 test
test num_active_tasks ... FAILED

failures:

failures:
    num_active_tasks

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 22 filtered out; finished in 0.33s


--- STDERR:              tokio::rt_metrics num_active_tasks ---
thread 'num_active_tasks' panicked at tokio/tests/rt_metrics.rs:104:5:
assertion `left == right` failed
  left: 0
 right: 1
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /rustc/9b00[956](https://github.com/tokio-rs/tokio/actions/runs/8936900362/job/24548151879?pr=6114#step:8:957)e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:343:17
   3: core::panicking::assert_failed
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:298:5
   4: rt_metrics::num_active_tasks
             at ./tests/rt_metrics.rs:104:5
   5: rt_metrics::num_active_tasks::{{closure}}
             at ./tests/rt_metrics.rs:85:22
   6: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

conradludgate · 2024-05-09T08:03:53Z

There's a CI failure

Seems to affect only 32bit arm in the multithreaded case. My guess is that the self.count.fetch_sub(1, Ordering::Relaxed) in the worker thread is not being synchronised before the self.count.load(Ordering::Relaxed) in the test thread when reading the metrics.

I don't think it makes too much sense to use a stronger ordering. Maybe in cfg(test) we could go with SeqCst but I don't like that idea very much. I think I would rather remove the assertion for the multi-threaded test, or at least only include it in x86_64/aarch64 which does seem to always work.

conradludgate · 2024-05-11T08:10:56Z

Opted to remove the flaky assert - it works always locally for me. I can't figure out a reliable construction to guarantee the test passes with only Relaxed ordering. I think this is good enough

tokio/tests/rt_metrics.rs

Darksonn · 2024-05-30T10:43:23Z

Any updates on this?

Darksonn

Thank you.

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Oct 27, 2023

conradludgate force-pushed the metrics-counter-pairs branch from 458813d to 5fec88d Compare October 29, 2023 15:10

Darksonn added A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics labels Nov 5, 2023

Darksonn reviewed Nov 5, 2023

View reviewed changes

tokio/src/util/linked_list.rs Outdated Show resolved Hide resolved

tokio/tests/rt_metrics.rs Outdated Show resolved Hide resolved

Darksonn requested a review from hawkw November 5, 2023 14:13

ghost reviewed Nov 27, 2023

View reviewed changes

tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved

conradludgate force-pushed the metrics-counter-pairs branch 8 times, most recently from 4ad7dd4 to 5355563 Compare November 28, 2023 11:08

conradludgate force-pushed the metrics-counter-pairs branch from 5355563 to 176d74b Compare February 12, 2024 14:52

Darksonn reviewed Feb 13, 2024

View reviewed changes

tokio/tests/rt_metrics.rs Outdated Show resolved Hide resolved

Darksonn reviewed Feb 13, 2024

View reviewed changes

tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved

conradludgate force-pushed the metrics-counter-pairs branch from 755e551 to 6497d0c Compare April 25, 2024 15:24

conradludgate requested a review from Darksonn April 30, 2024 14:29

Darksonn reviewed May 1, 2024

View reviewed changes

tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved

tokio/tests/rt_metrics.rs Show resolved Hide resolved

Darksonn reviewed May 11, 2024

View reviewed changes

tokio/tests/rt_metrics.rs Outdated Show resolved Hide resolved

Darksonn mentioned this pull request May 11, 2024

Proposed list of Metrics to Stabilize #6546

Open

25 tasks

feat: add task counter pairs

bb82406

conradludgate force-pushed the metrics-counter-pairs branch from 0542b98 to bb82406 Compare June 9, 2024 08:43

Darksonn approved these changes Jun 9, 2024

View reviewed changes

Darksonn merged commit 341b5da into tokio-rs:master Jun 9, 2024
83 checks passed

Darksonn mentioned this pull request Jun 9, 2024

metrics: use MetricAtomic* for task counters #6624

Merged

mox692 mentioned this pull request Jun 10, 2024

metrics: stabilize num_alive_tasks #6619

Merged

This was referenced Jul 22, 2024

chore: release Tokio v1.39.0 #6708

Closed

Release Tokio v1.39.0 #6711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add task counter pairs #6114

feat: add task counter pairs #6114

conradludgate commented Oct 27, 2023 •

edited

Loading

conradludgate commented Oct 30, 2023

hawkw commented Oct 30, 2023

Darksonn commented Nov 25, 2023

conradludgate commented Nov 26, 2023

Darksonn commented Nov 27, 2023

Darksonn commented Jan 30, 2024

conradludgate commented Jan 30, 2024

Darksonn commented Jan 30, 2024

conradludgate commented Feb 12, 2024

conradludgate commented Feb 12, 2024

Darksonn commented May 3, 2024

conradludgate commented May 9, 2024

conradludgate commented May 11, 2024

Darksonn commented May 30, 2024

Darksonn left a comment

feat: add task counter pairs #6114

feat: add task counter pairs #6114

Conversation

conradludgate commented Oct 27, 2023 • edited Loading

Motivation

Solution

Open questions

conradludgate commented Oct 30, 2023

num_blocking_threads

num_idle_blocking_threads

injection_queue_depth

worker_local_queue_depth

blocking_queue_depth

hawkw commented Oct 30, 2023

Darksonn commented Nov 25, 2023

conradludgate commented Nov 26, 2023

Darksonn commented Nov 27, 2023

Darksonn commented Jan 30, 2024

conradludgate commented Jan 30, 2024

Darksonn commented Jan 30, 2024

conradludgate commented Feb 12, 2024

conradludgate commented Feb 12, 2024

Darksonn commented May 3, 2024

conradludgate commented May 9, 2024

conradludgate commented May 11, 2024

Darksonn commented May 30, 2024

Darksonn left a comment

Choose a reason for hiding this comment

conradludgate commented Oct 27, 2023 •

edited

Loading

`num_blocking_threads`

`num_idle_blocking_threads`

`injection_queue_depth`

`worker_local_queue_depth`

`blocking_queue_depth`