-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LiveMetrics] report process metrics CPU Total
and Committed Memory
#42213
Conversation
API change check API changes are not detected in this pull request. |
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/src/Internals/Manager.Metrics.cs
Outdated
Show resolved
Hide resolved
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/CHANGELOG.md
Outdated
Show resolved
Hide resolved
...onitor.OpenTelemetry.AspNetCore/tests/Azure.Monitor.OpenTelemetry.AspNetCore.Demo/Program.cs
Outdated
Show resolved
Hide resolved
...onitor.OpenTelemetry.AspNetCore/tests/Azure.Monitor.OpenTelemetry.AspNetCore.Demo/Program.cs
Outdated
Show resolved
Hide resolved
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/src/Internals/Manager.Metrics.cs
Outdated
Show resolved
Hide resolved
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/src/Internals/Manager.Metrics.cs
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This reverts commit 38bf3f3.
CPU Total
and Committed Memory
CPU Total
and Committed Memory
CPU Total
and Committed Memory
@cijothomas @rajkumar-rangaraj |
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/src/Internals/Manager.Metrics.cs
Outdated
Show resolved
Hide resolved
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/src/Internals/Manager.Metrics.cs
Outdated
Show resolved
Hide resolved
sdk/monitor/Azure.Monitor.OpenTelemetry.LiveMetrics/CHANGELOG.md
Outdated
Show resolved
Hide resolved
The screenshot in the description has one with CPU showing above 100%.. That should never happen, if you are sending normalized values. |
I suggest that we evaluate what .NET Extension did and align on the algorithm. At minimum, if we want to introduce our own implementation, I want the algorithm to be carefully designed, reviewed (and backed by Math) and documented (e.g. this PR sets a good example). https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.Diagnostics.ResourceMonitoring/Linux/LinuxUtilizationProvider.cs |
Agree mostly. My thoughts are
|
Providing a summary of the issue before we exploring into the specifics: In the Application Insights SDK, for the Windows environment, CPU metrics are collected for the entire operating system. In contrast, for Azure App Service and other non-Windows environments, process-level CPU metrics are utilized. When it comes to memory metrics, Windows uses the performance counter \Memory\Committed Bytes, while other environments rely on process-level private bytes. Languages other than .NET typically read the OS-level counter to send these metrics, leading to inconsistencies in the data sent across different environments and languages. Mothra has initiated a collaboration with the services team to establish clear guidelines on what exactly should be sent. Our objective is to ensure that the same metrics are sent from all environments and languages. The services team has requested additional time to formalize this process. As an interim solution, both service and SDK teams have agreed to use process-level metrics for both CPU and memory. This temporary solution will be implemented in beta versions, and we will not release a Release Candidate (RC) or General Availability (GA) version until the standardization is finalized. This approach allows us to continue our work on filters in the meantime.
This is correct, except for standard metrics. For live metrics, we don't want to take the dependency on the OpenTelemetry Metric SDK, as it will make the implementation very complex.
Correct, it is not worth the effort.
I have to agree with Cijo on this part. The algorithm used is very straightforward, and I don't see any issues with it. The LinuxUtilizationProvider uses the same approach..
This is not true. Mothra is trying to explain that using Finally, I need to acknowledge that we need to accurately capture the information in the changelogs. |
Thanks! Yes I can see the description is pretty clear. |
Changes
Measured as:
((change in ticks / period) / number of processors)
Testing
I've collected screenshots and detailed description for each counter.
For these metrics, I'm trying to match what Visual Studio's Debugger shows.
Memory (click to expand)
To test I debugged an app in Visual Studio and compared to the Process Memory.
Visual Studio Debugger
This shows roughly 50 MB of Process Memory
OTel uses:
https://learn.microsoft.com/dotnet/api/system.diagnostics.process.workingset64
This gives a value almost double of what Visual Studio Debugger is showing:
AI SDK uses:
https://learn.microsoft.com/dotnet/api/system.diagnostics.process.privatememorysize64
This value closely matches Visual Studio's Diagnostic Tools:
CPU (click to expand)
To test CPU I compute squareroots in a loop for 20 seconds.
Visual Studio's Debugger:
Here, CPU jumps to 100% and stays high for a period before dropping back to 0%.
Otel uses
"A TimeSpan that indicates the amount of time that the associated process has spent running code inside the application portion of the process (not inside the operating system core)."
https://learn.microsoft.com/dotnet/api/system.diagnostics.process.userprocessortime
"A TimeSpan that indicates the amount of time that the process has spent running code inside the operating system core."
https://learn.microsoft.com/dotnet/api/system.diagnostics.process.privilegedprocessortime
Note that this grows and never resets to zero. That flat line at the end is when the compute finished.
AI SDK uses
"A TimeSpan that indicates the amount of time that the associated process has spent utilizing the CPU. This value is the sum of the UserProcessorTime and the PrivilegedProcessorTime."
https://learn.microsoft.com/dotnet/api/system.diagnostics.process.totalprocessortime
TotalProcessorTime is defined as the sum of UserProcessorTime and PrivilegedProcessorTime. So this is the same metrics used by OTel. But AI SDK has a specific algorithm to calculate the difference.
Note that this more closely matches Visual Studio