Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename metrics for Resource Monitoring #5309

Conversation

evgenyfedorov2
Copy link
Contributor

@evgenyfedorov2 evgenyfedorov2 commented Jul 25, 2024

Fixes #5113

  1. Renamed metrics
  2. Introduced more specific metrics for container related measurements to distinguish process vs. container (cgroup).
Microsoft Reviewers: Open in CodeFlow

@evgenyfedorov2 evgenyfedorov2 self-assigned this Jul 25, 2024
Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks like it potentially may be a breaking change. Metrics are kind part of the public surface - i.e., documentations, dashboards and other monitoring systems depend on existing/published metrics.

Unfortunately the original issue doesn't provide great level of detailed information on the underlying issue(s), scenarios where and how it manifests, how it affects customers, etc.

Could you please provide more information and assessment of the change in the following form:

  • Customer Imparct - how would this change affect existing customers?
  • Is this a regression? If yes, when was it introduced?
  • How risky is this change?

Thank you.

I'm blocking the merge to understand the risks and impacts and stop an accidental merge.

@dotnet-policy-service dotnet-policy-service bot added the waiting-author-feedback 📭 The author of this issue needs to respond in order for us to continue investigating this issue. label Jul 26, 2024
@evgenyfedorov2
Copy link
Contributor Author

evgenyfedorov2 commented Jul 26, 2024

@RussKie, here you are:

  • Customer Impact - Only monitoring dashboards will break, it requires customers to fix the dashboards.
  • Is this a regression? - No
  • How risky is this change? - Not sure how many customers we have for those two ObservableGaugemetrics (I understand those are pretty new and not widely adopted), but I suggest making the change ASAP to avoid onboarding even more customers to incorrect metric names.

@dotnet-policy-service dotnet-policy-service bot removed the waiting-author-feedback 📭 The author of this issue needs to respond in order for us to continue investigating this issue. label Jul 26, 2024
@evgenyfedorov2
Copy link
Contributor Author

Unfortunately the original issue doesn't provide great level of detailed information on the underlying issue(s), scenarios where and how it manifests, how it affects customers, etc.

It does not affect customers as such, they can live with it forever and be happy :) However, it is just semantically incorrect to have process.* names for metrics measuring CPU/Memory utilization of the whole container or control group.

@RussKie
Copy link
Member

RussKie commented Jul 29, 2024

@JamesNK @joperezr any thoughts on this?

@JamesNK
Copy link
Member

JamesNK commented Jul 29, 2024

It's definitely a breaking change. It needs to be documented as a breaking change and the mitigations required to work around it be in release notes.

You could require people to update whatever consumes the metrics to make a fix. If there are people who can't do that then you could add an app context switch that someone can set in their app to keep old behavior.

@joperezr
Copy link
Member

@evgenyfedorov2 can you please go into a bit more detail as to what is the motivation behind this change? Is it mostly just that the naming is inaccurate so we want to fix it? We should get the full motivation so we can evaluate if this is worth the breaking change.

Comment on lines +18 to +31
/// Gets the CPU consumption share of the running process in range <c>[0, 1]</c>.
/// </summary>
/// <remarks>
/// The type of an instrument is <see cref="System.Diagnostics.Metrics.ObservableGauge{T}"/>.
/// </remarks>
public const string ProcessCpuUtilization = "process.cpu.utilization";

/// <summary>
/// Gets the memory consumption share of the running process in range <c>[0, 1]</c>.
/// </summary>
/// <remarks>
/// The type of an instrument is <see cref="System.Diagnostics.Metrics.ObservableGauge{T}"/>.
/// </remarks>
public const string ProcessMemoryUtilization = "dotnet.process.memory.virtual.utilization";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we decorate these as [Obsolete] to indicate they shouldn't be used?
image

RussKie pushed a commit that referenced this pull request Aug 12, 2024
Fixes #5113 
Previous art: #5309 

Add new metrics with correct names. Old metrics will continue to be enabled by default.

### Existing metric setup

**Windows Snapshot provider class**
    `process.cpu.utilization`
    `dotnet.process.memory.virtual.utilization`

**Windows Container Snapshot provider class**
    `process.cpu.utilization`
    `dotnet.process.memory.virtual.utilization`

**Linix Utilization Provider class**
    `process.cpu.utilization`
    `dotnet.process.memory.virtual.utilization`


### New metric setup

**Windows Snapshot provider class**
    `process.cpu.utilization` - no changes
    `dotnet.process.memory.virtual.utilization` - no changes

**Windows Container Snapshot provider class**
    `process.cpu.utilization` - no changes
    `dotnet.process.memory.virtual.utilization` - calculates memory for the dotnet process only (instead of all processes)
    `container.cpu.limit.utilization` - new metric, same value as `process.cpu.utilization`
    `container.memory.limit.utilization` - new metric, calculates memory for all processes in the container

**Linux Utilization Provider class**
    `process.cpu.utilization` - fixed incorrect scale calculation, instead of `host CPUs / CPU limit / CPU request`, it is now `host CPUs / CPU request`
    `dotnet.process.memory.virtual.utilization` - no changes
    `container.cpu.limit.utilization` - new metric, value is relative to CPU resource limit (aka maximum CPU units)
    `container.memory.limit.utilization` - new metric, calculates memory for all processes in the container
    `container.cpu.request.utilization` - new metric, same value as `process.cpu.utilization`
joperezr added a commit to joperezr/extensions that referenced this pull request Aug 15, 2024
…ng (dotnet#5341)

Add metrics with correct names for Resource Monitoring (dotnet#5341)

Fixes dotnet#5113
Previous art: dotnet#5309

Add new metrics with correct names. Old metrics will continue to be enabled by default.

### Existing metric setup

**Windows Snapshot provider class**
    `process.cpu.utilization`
    `dotnet.process.memory.virtual.utilization`

**Windows Container Snapshot provider class**
    `process.cpu.utilization`
    `dotnet.process.memory.virtual.utilization`

**Linix Utilization Provider class**
    `process.cpu.utilization`
    `dotnet.process.memory.virtual.utilization`

### New metric setup

**Windows Snapshot provider class**
    `process.cpu.utilization` - no changes
    `dotnet.process.memory.virtual.utilization` - no changes

**Windows Container Snapshot provider class**
    `process.cpu.utilization` - no changes
    `dotnet.process.memory.virtual.utilization` - calculates memory for the dotnet process only (instead of all processes)
    `container.cpu.limit.utilization` - new metric, same value as `process.cpu.utilization`
    `container.memory.limit.utilization` - new metric, calculates memory for all processes in the container

**Linux Utilization Provider class**
    `process.cpu.utilization` - fixed incorrect scale calculation, instead of `host CPUs / CPU limit / CPU request`, it is now `host CPUs / CPU request`
    `dotnet.process.memory.virtual.utilization` - no changes
    `container.cpu.limit.utilization` - new metric, value is relative to CPU resource limit (aka maximum CPU units)
    `container.memory.limit.utilization` - new metric, calculates memory for all processes in the container
    `container.cpu.request.utilization` - new metric, same value as `process.cpu.utilization`

----
#### AI description  (iteration 1)
#### PR Classification
New feature: Added metrics with correct names for resource monitoring.

#### PR Summary
This pull request introduces new metrics for resource monitoring with correct naming conventions and updates the related tests and implementation.
- `LinuxUtilizationProvider.cs`: Added new metrics for container CPU and memory utilization, and updated existing metrics.
- `AcceptanceTest.cs`: Added new tests for verifying the new metrics and updated existing tests for better coverage.
- `ResourceUtilizationInstruments.cs`: Defined new constants for the new metrics.
- Removed `WindowsCounters.cs` as it is no longer needed.
@github-actions github-actions bot locked and limited conversation to collaborators Sep 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect metric names in ResourceUtilizationInstruments
4 participants