-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Do Not Merge] Add description doc for DotNet Runtime metrics #404
Conversation
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-----------------------------------------------|--------------------------|-----------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**gc.totalmemorysize** | Total allocated bytes | `By` | ObservableGauge | `Int64` | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think many of the ObservableGauge
here should be ObservableUpDownCounter
once .NET has the support for up-down counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably need to to get more precise on this one. Some different possible interpretations:
- Total amount of memory ever allocated from the GC since process start
- Total amount of memory commited by the GC currently
- Total amount of memory for live .NET objects + memory for free objects (objects that existed in the past and were collected but the memory hasn't yet been reused)
- Total amount of memory for live .NET objects only
- Total amount of memory for rooted .NET objects (objects that would still be live after a full GC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some raw notes from our discussion:
process.runtime.dotnet.gc.heapsize: live object (not necessarily hold by the user objects/threads, could be hold by the GC itself) + fragmentation
all the gen adding up should <= committed (since GC commits at the granularity of page, and it normally would ask for multiple pages instead of just one for caching purpose)
neither of the above includes GC's own memory usage
github.com/Maoni0/mem-doc/blob/master/doc/.NETMemoryPerformanceAnalysis.md
GC.GetGenerationSize(n) is not providing the transactional behavior since we have to call it multiple times for different gen, it is okay as long as we mention it in the description/doc
remove gc.totalmemorysize since it can be covered by adding up different gens
Gets the number of timers that are currently active. An active timer is registered | ||
to tick at some point in the future, and has not yet been canceled. | ||
|
||
## Process related metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a general question - do we want these to be in the OpenTelemetry.Instrumentation.Runtime
package, or we want them to be in a different package (and if yes, what should be the name of that package)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote separate package and OpenTelemetry.Instrumentation.Process as a potential name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reverts commit 2c10b72.
|
||
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-------------------------------------------|-----------------------------|----------------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**assembly.count** | Number of Assemblies Loaded | `{assemblies}` | ObservableCounter | `Int64` | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.NET runtime has the ability to unload assemblies, so AppDomain.GetAssemblies().Length
could decrease in some occasions. The instrument type should be ObservableUpDownCounter
(before ObservableUpDownCounter
is ready, I'll use ObservableGauge
in implementation). I'll update the doc later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment is for .NET Framework and not for .NET Core as .NET Core doesn't allow more than one appomain in the process.
AppDomain.GetAssemblies().Length could decrease in some occasions.
I am not sure if this is accurate either. You may unload the whole appdomain and not individual assembly. So, technically the AppDomain.GetAssemblies().Length
doesn't decrease. but the total assemblies in the process can decrease.
In .NET Core, uses AssemblyLoadContext for unloading all assemblies in that context by unloading the context.
CC @noahfalk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested it, but I would assume that assemblies in any AssemblyLoadContext, including an unloadable one, appear in the AppDomain.GetAssemblies() list. In that case AppDomain.GetAssemblies().Length could decrease because a load context unloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will such a counter have really a useful meaning in .NET core apps? And how can users handle such a counter in .NET Framework apps? or we don't care much about .NET Framework cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My answer is YES. I can think of a valid use case - certain online service / background job run out of memory after hours and the engineers can tell from the "assembly loaded" metrics that the process kept loading plugins without unloading them.
This happened in my previous team, where a backend processing job detects new XML schema and compiles it to assembly (for sake of perf) and load it dynamically, the backend job ended up loading tens of thousands of assemblies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Tarek and Noah! I'm getting the assembly count for the "current domain" AppDomain.CurrentDomain.GetAssemblies().Length
, and the metric has to work for both .NET Framework and .NET Core. If the value could decrease in any case, I couldn't use ObservableCounter
because a metric data point could be dropped.
I found one remark about a special case when the assembly is in the default application domain:
If an assembly is loaded into the default application domain, it cannot be unloaded from memory while the process is running.
If Noah's right about assemblies in AssemblyLoadContext would appear in AppDomain.CurrentDomain.GetAssemblies()
, then I should use ObservableGauge
(and use ObservableUpDownCounter
when available). Otherwise it might lead to loss of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting the assembly count for the "current domain" AppDomain.CurrentDomain.GetAssemblies().Length
If we are reporting the number from the current appdomain only, I'll question the @reyang's scenario certain online service / background job run out of memory after hours and the engineers can tell from the "assembly loaded" metrics that the process kept loading plugins without unloading them.
when running on .NET Framework. Looks what we report wouldn't be enough for that scenario.
To clarify, .NET Framework can have multiple appdomains. I guess it will be more useful reporting the total count of assemblies in all appdomains and not necessary the current appdomain.
In .NET Core, there is only one appdomain, so reporting the number from the current appdomain would work, I guess. We need to confirm loading and unloading AssemblyLoadContext is going to be reflected to the number we report. When I get some time, I can try to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have confirmed in .NET Core apps when unloading the AssemblyLoadContext
, the unloaded assemblies inside that context will be reflected in the AppDomain.CurrentDomain.GetAssemblies()
. In other word, the number of assemblies in the appsomain can decrease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Name | Description | Units | Instrument Type | Value Type | Attribute Key(s) | Attribute Values | | ||
|-----------------------------------------------|--------------------------|-----------|-------------------|------------|------------------|------------------| | ||
| process.runtime.dotnet.**gc.totalmemorysize** | Total allocated bytes | `By` | ObservableGauge | `Int64` | | | | ||
| process.runtime.dotnet.**gc.count** | Garbage Collection count | `{times}` | ObservableCounter | `Int64` | gen | gen0, gen1, gen2 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suggested by @noahfalk - for folks who have been using the EventCounters, we might consider something like "legacy time in GC" and make it an explicit opt-in (with a clear API name indicating that it has an unclear definition, and we don't recommend it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw we should be able to give a precise definition for the legacy counter too, it just isn't always useful or intuitive. I'm pretty sure the existing counter is computed like this:
GC_n = most recent GC that has ended prior to computing the value
GC_n-1 = GC that ended prior to GC_n
T_end_n = time when GC_n ended
T_end_n-1 = time when GC_n-1 ended
T_start_n = time when GC_n started
inter_gc_time = T_end_n - T_end_n-1
gc_duration = T_end_n - T_start_n
counter value = 100 * gc_duration / inter_gc_time
As an example if GC_n-1 ran from 2.1 sec to 2.2 sec and GC_n ran from 2.5 sec to 3.0 sec then
inter_gc_time = 3.0 - 2.2 = 0.8
gc_duration = 3.0 - 2.5 = 0.5
counter_value = 100 * 0.5 / 0.8 = 62.5%
People usually assume that the value measures the fraction of GC time during a fixed sampling interval (say every 5 minutes) but really it is measuring the fraction of GC time during a variable interval between the most recent GC and the one before that. If the most recent two GCs happen to be close together the counter may appear unexpectedly high and this often causes confusion.
…use it can decrease
@@ -0,0 +1,175 @@ | |||
# Runtime metrics description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Runtime metrics overview" or "Runtime metric details" here instead of "description"? Also we should link to this from README.
Metrics name are prefixed with the `process.runtime.dotnet.` namespace, following | ||
the general guidance for runtime metrics in the | ||
[specs](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#runtime-environment-specific-metrics---processruntimeenvironment). | ||
Instrument Units [should follow](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md#instrument-units) | ||
the Unified Code for Units of Measure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-worded this a bit:
Metric names are prefixed with the process.runtime.dotnet
namespace following
the general guidance in the OpenTelemetry Specification. Instrument units follow the Unified Code for Units of Measure
as outlined in the OpenTelemetry specification.
|
||
[JitInfo.GetCompiledMethodCount](https://docs.microsoft.com/dotnet/api/system.runtime.jitinfo.getcompiledmethodcount?view=net-6.0#system-runtime-jitinfo-getcompiledmethodcount(system-boolean)): | ||
Gets the number of methods that have been compiled. | ||
The scope of this value is global. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to state The scope of this value is global.
on all of these? Could probably just drop that since it seems obvious (IMO) or move it to the heading so we only state it once? 🤷
The metrics in this section can be enabled by setting the | ||
`RuntimeMetricsOptions.IsJitEnabled` switch. | ||
|
||
These metrics are only available for NET6_0_OR_GREATER. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the preprocessor directives in this doc like NET6_0_OR_GREATER
is very technical. Maybe just say like...
These metrics are available when targeting .NET6 or later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is opened to just solicit feedback only.
Closed in favor of #475. |
For issue #335.
Creating a doc for a description for each runtime metrics. This doc will both serve as a design doc and as a formal descriptive doc for users in future.
I'd like to ask for clarification on a few metrics that I'm not sure whether they are monotonically increasing or not.
Can I assume the metrics in the EventCounter implementation using IncrementingPollingCounter are all monotonically increasing? More specifically on this one:
I think the following metrics are monotonically increasing but the EventCounter implementation used PollingCounter rather than IncrementingPollingCounter:
I think these are monotonically increasing, but would like to confirm: