Reliable Metric Values #580

blue42u · 2022-05-31T17:45:52Z

blue42u
May 31, 2022

Background
The vast majority of performance metrics we measure at runtime are based on samples, however samples may not be a reliable source of information if there are not enough of them in a given location. We should provide some kind of "reliability" metric alongside every value (and thus deter "overreactions" to metric values that are far too coarse for actual use).

Pending further statistical investigation, such a "reliability" metric will most likely be (or be based on) the number of samples recorded in a given calling context, thus the first batch of ideas focuses on this.

Design Ideas

The Viewer should present the "reliability" metric in a tooltip on the values themselves, so they appear on mouse hover.
- Maybe a "warning" icon could be placed in the cell if the reliability metric falls below a given threshold (eg. <20 samples)?
The "reliability" metric can be just another metric in the *.db formats, and then connected together in the metrics.yaml.
- This approach requires no change to the *.db formats, so this is not a blocker for most (if any) work.
- Options include another formula: sub-key, or another presentation: value (with keys to connect to appropriate metrics).
The extra sample counts need to be output by hpcrun.
- Internally this could be implemented as a second (similarly named) metric, or by expanding the metric values into (value, sample count) pairs.
- Should samples be recorded as uint64_t instead of double? (Are 12 extra value bits worth the awkwardness in encoding/decoding?)

Arose out of a discussion with @jmellorcrummey and @laksono

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reliable Metric Values #580

{{title}}

Replies: 0 comments

Select a reply

Reliable Metric Values #580

blue42u May 31, 2022

Replies: 0 comments

blue42u
May 31, 2022