Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process.signals_pending metric to semantic conventions #2846

Conversation

andrzej-stencel
Copy link
Member

Fixes #2826

Changes

Adds a new metric process.signals_pending to the semantic conventions.

Context

Here's a related proposal to implement this metric in the OT collector's hostmetrics receiver: open-telemetry/opentelemetry-collector-contrib#14084.

@andrzej-stencel andrzej-stencel marked this pull request as ready for review September 28, 2022 11:29
@andrzej-stencel andrzej-stencel requested review from a team September 28, 2022 11:29
@@ -42,6 +42,7 @@ Below is a table of Process metric instruments.
| `process.threads` | UpDownCounter | {threads} | Process threads count. | |
| `process.open_file_descriptors` | UpDownCounter | {count} | Number of file descriptors in use by the process. | |
| `process.context_switches` | Counter | {count} | Number of times the process has been context switched. | `type` SHOULD be one of: `involuntary`, `voluntary` |
| `process.signals_pending` | UpDownCounter | {signals} | Number of pending signals for the process. | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signals are generally POSIX only. Can you call out how this applies to non-POSIX systems (should they adapt signal-like things here?

Specifically:

  • Is this required for POSIX systems or just recommended
  • What does this mean for non-POSIX systems (or it shouldn't be filled out on those)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, signals are a POSIX-only concept. I assume non-POSIX systems won't report this metric. I wouldn't say the metric is either required or recommended for any system (POSIX or not).

As I understand it, the OT specification only establishes a name for a specific information (a metric, an attribute, etc.), but does not specify weather a piece of information is required or recommended to be delivered by a system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth does this answer your questions? Do you think it makes sense?

@andrzej-stencel andrzej-stencel force-pushed the add-process-signals-pending-metric branch from f1d074b to 70deae4 Compare October 5, 2022 09:12
@arminru arminru added area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory labels Oct 10, 2022
@andrzej-stencel andrzej-stencel force-pushed the add-process-signals-pending-metric branch 2 times, most recently from 7d61c33 to 67909ff Compare October 13, 2022 13:00
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 21, 2022
@andrzej-stencel andrzej-stencel force-pushed the add-process-signals-pending-metric branch 2 times, most recently from bba6451 to 49beec1 Compare October 24, 2022 09:18
@andrzej-stencel andrzej-stencel force-pushed the add-process-signals-pending-metric branch from 49beec1 to 8d73bfd Compare October 31, 2022 13:58
@@ -43,6 +43,7 @@ Below is a table of Process metric instruments.
| `process.open_file_descriptors` | UpDownCounter | {count} | Number of file descriptors in use by the process. | |
| `process.context_switches` | Counter | {count} | Number of times the process has been context switched. | `type` SHOULD be one of: `involuntary`, `voluntary` |
| `process.paging.faults` | Counter | {faults} | Number of page faults the process has made. | `type`, if specified, SHOULD be one of: `major` (for major, or hard, page faults), `minor` (for minor, or soft, page faults). |
| `process.signals_pending` | UpDownCounter | {signals} | Number of pending POSIX signals for the process. | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non-POSIX systems? Should they use this same metrics or they should invent a different name?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect non-POSIX systems to use the same metric (and I don't know what they would use). That said, if there is a valid use-case down the road, we can change the description to accommodate. We are considering description changes to be a stable change https://docs.google.com/document/d/1Nvcf1wio7nDUVcrXxVUN_f8MNmcs0OzVAZLvlth1lYY/edit#heading=h.e71xzqqz0b5t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect non-POSIX systems to use the same metric (and I don't know what they would use).

Would it make sense to put it as process.posix.signals_pending?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect non-POSIX systems to use the same metric (and I don't know what they would use).

Would it make sense to put it as process.posix.signals_pending?

I think I understand where this comes from. As described in the metrics semantic conventions' General Guidelines:

Associated metrics SHOULD be nested together in a hierarchy based on their usage. Define a top-level hierarchy for common metric categories: for OS metrics, like CPU and network; for app runtimes, like GC internals. Libraries and frameworks should nest their metrics into a hierarchy as well. This aids in discovery and adhoc comparison. This allows a user to find similar metrics given a certain metric.

and further below:

Semantic ambiguity SHOULD be avoided. Use prefixed metric names in cases where similar metrics have significantly different implementations across the breadth of all existing metrics. For example, every garbage collected runtime has slightly different strategies and measures. Using a single set of metric names for GC, not divided by the runtime, could create dissimilar comparisons and confusion for end users. (For example, prefer process.runtime.java.gc* over process.runtime.gc.*.) Measures of many operating system metrics are similarly ambiguous.

If I understand correctly, your point Reiley is that if a non-POSIX system has a different understanding of what a "signal" is, the metric process.signals_pending might have a different meaning, thus supporting the division into a process.posix.signals_pending metric a a process.something-else-that-is-not-posix.signals_pending or similar metric. This is similar to the process.runtime.java.gc* example in the documentation.

I have two concerns with this:

  1. The .posix. namespace would be based on a feature of a system, not on a specific component, like the Java runtime, which in my view makes it less discoverable and harder to understand by users.
  2. Most of the systems we run (definitely in the server world, given Linux dominance) are POSIX, which makes it the default setting. I'd be more in favor to give a separate namespace to a non-POSIX-specific concept, while keeping the POSIX-specific concepts - at least the most popular ones - in the default namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two concerns with this:

  1. The .posix. namespace would be based on a feature of a system, not on a specific component, like the Java runtime, which in my view makes it less discoverable and harder to understand by users.
  2. Most of the systems we run (definitely in the server world, given Linux dominance) are POSIX, which makes it the default setting. I'd be more in favor to give a separate namespace to a non-POSIX-specific concept, while keeping the POSIX-specific concepts - at least the most popular ones - in the default namespace.

I think this requires more discussion, it's not a specific to this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Unfortunately I wasn't able to join today's Specification SIG to put this under discussion.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrzej-stencel andrzej-stencel force-pushed the add-process-signals-pending-metric branch from 8d73bfd to 9c40d7c Compare November 7, 2022 10:03
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add process.signals_pending metric to semantic conventions
6 participants