Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bringing .NET Runtime metrics instrumentation to Stable #335

Closed
xiang17 opened this issue Apr 29, 2022 · 4 comments · Fixed by #412
Closed

Bringing .NET Runtime metrics instrumentation to Stable #335

xiang17 opened this issue Apr 29, 2022 · 4 comments · Fixed by #412
Labels
comp:instrumentation.runtime Things related to OpenTelemetry.Instrumentation.Runtime

Comments

@xiang17
Copy link
Contributor

xiang17 commented Apr 29, 2022

Opening an issue for gaining feedbacks and making improvements on .NET Runtime metric instrumentation, regarding which metrics to expose, what each metrics means exactly, and how to calculate/fetch each metrics along with clear documentation for each metrics.

The goal is to have well defined and documented metrics for .NET Runtime users to have better knowledge of a system in a user-friendly way, in order to get the Runtime metrics instrumentation library to a stable stage.

Currently the Runtime metrics already works upon finish of the issue #204, but it's still in Alpha stage. The metrics instrumentations are initiated in RuntimeMetrics.cs. The metrics are mostly from EventCounters: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/RuntimeEventSource.cs

UI

The primary goal is to provide enable users to utilize user-friendly UI to gain insights in .NET Runtime behavior. Here is a simple demo for the concept. The Grafana dashboard graphs I set up in my laptop look this (see appendix on how to do it):
Dotnet-Grafana

Or this:
MicrosoftTeams-image (1)

Metrics

TODO: List of metrics grouped in CPU, Memory, Garbage Collection, Threadpool, Process, and etc.

Documentation

Clear description on what each metric means. Given that developers and support engineers might rely on this to debug or diagnose issues, the description should reveal reasonably enough insight into internal details.

Help from Runtime team is appreciated especially for this part.

Appendix

Instruction on how to set up the graphs:

  1. Set up Prometheus and Grafana using this guide.
  2. Configure AddRuntimeMetrics in the Program.cs code according to the Runtime instrumentation library readme.
using var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter("MyCompany.MyProduct.MyLibrary")
    .AddRuntimeMetrics()
    .AddPrometheusExporter(options => { options.StartHttpListener = true; })
    .Build();
@xiang17 xiang17 added the comp:instrumentation.runtime Things related to OpenTelemetry.Instrumentation.Runtime label Apr 29, 2022
@xiang17
Copy link
Contributor Author

xiang17 commented May 18, 2022

Here is a doc for the description for existing metrics and a bunch of TODO items. (I'm not sure whether it's better to place it in a markdown file in the folder since it's very long, or to keep it in issue since there are lots of TODO items.)

Runtime metrics description

Metrics name are prefixed with the process.runtime.dotnet. namespace, following
the general guidance for runtime metrics in the
specs.
Instrument Units should follow
the Unified Code for Units of Measure.

RuntimeMetricsOptions.IsGcEnabled

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.runtime.dotnet.gc.heap GC Heap Size ObservableGauge By GC.GetTotalMemory(false) Int64
process.runtime.dotnet.gen_0-gc.count Gen 0 GC Count ObservableGauge {times} GC.CollectionCount(0) Int32
process.runtime.dotnet.gen_1-gc.count Gen 1 GC Count ObservableGauge {times} GC.CollectionCount(1) Int32
process.runtime.dotnet.gen_2-gc.count Gen 2 GC Count ObservableGauge {times} GC.CollectionCount(2) Int32
  • GC.GetTotalMemory:
    The number of bytes currently thought to be allocated.
    It does not wait for garbage collection to occur before returning.

  • GC.CollectionCount:
    The number of times garbage collection has occurred for the specified generation
    of objects.

Additional GC metrics only available for NETCOREAPP3_1_OR_GREATER

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.runtime.dotnet.alloc.rate Allocation Rate ObservableCounter By GC.GetTotalAllocatedBytes() Int64
process.runtime.dotnet.gc.fragmentation GC Fragmentation ObservableCounter (TODO: change to ObservableGauge?) 1 this.GetFragmentation double
  • GC.GetTotalAllocatedBytes:
    Gets a count of the bytes allocated over the lifetime of the process. The returned
    value does not include any native allocations. The value is an approximate count.

  • this.GetFragmentation: If GC.GetGCMemoryInfo().HeapSizeBytes != 0,
    the value is
    GC.GetGCMemoryInfo().FragmentedBytes * 100d / GC.GetGCMemoryInfo().HeapSizeBytes,
    else 0.

    GCMemoryInfo.FragmentedBytes:
    Gets the total fragmentation when the last garbage collection occurred.
    Divived by GCMemoryInfo.HeapSizeBytes,
    which is the total heap size when the last garbage collection occurred.

TODO: Change 100d to 1.0d.

TODO: It should be ObservableGauge for fragmentation.
See issue#383.
A garbage collection should reset the value, so it is not monotonically increasing
value(s) as ObservableCounter should be.
See Asynchronous Counter
vs Asynchronous Gauge.

TODO: [Question] Chane the name to gc.allocated.bytes, gc.fragmentation.ratio.
I couldn't find this kind of metrics in use except in an issue description.

Additional GC metrics only available for NET6_0_OR_GREATER

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.runtime.dotnet.gc.committed GC Committed Bytes ObservableCounter Mi (double)(GC.GetGCMemoryInfo().TotalCommittedBytes / 1_000_000) (return type is Int64)

GCMemoryInfo.TotalCommittedBytes:
Gets the total committed bytes of the managed heap.

TODO: Use the units consistently.
According to the Prefixes and units used in Information Technology section
of UCUM, Mi is a prefix, meaning 1048576 rather than 1000000.
So here it should use MiBy for mebibyte, to be consistent with the "c/s"
(case insensitive) form used in
JVM metrics.

RuntimeMetricsOptions.IsJitEnabled (only available for NET6_0_OR_GREATER)

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.runtime.dotnet.il.bytes.jitted IL Bytes Jitted ObservableCounter By System.Runtime.JitInfo.GetCompiledILBytes() Int64
process.runtime.dotnet.methods.jitted.count Number of Methods Jitted ObservableCounter {methods} System.Runtime.JitInfo.GetCompiledMethodCount() Int64
process.runtime.dotnet.time.in.jit Time spent in JIT ObservableGauge ms System.Runtime.JitInfo.GetCompilationTime().TotalMilliseconds Double

JitInfo.GetCompiledILBytes:
Gets the number of bytes of intermediate language that have been compiled.
The scope of this value is global.

JitInfo.GetCompiledMethodCount:
Gets the number of methods that have been compiled.
The scope of this value is global.

JitInfo.GetCompilationTime:
Gets the amount of time the JIT Compiler has spent compiling methods.
The scope of this value is global.

TODO: [Question] Why is the last one ObservableGauge? I think it should be ObservableCounter
like the other two.
See Asynchronous Counter
vs Asynchronous Gauge.

TODO: Update ms to ns and use TimeSpan.Ticks
which is Int64.
The smallest unit of time is the tick, which is equal to 100 nanoseconds or one
ten-millionth of a second.

RuntimeMetricsOptions.IsThreadingEnabled (only available for NETCOREAPP3_1_OR_GREATER)

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.runtime.dotnet.monitor.lock.contention.count Monitor Lock Contention Count ObservableGauge {times} Monitor.LockContentionCount Int64
process.runtime.dotnet.threadpool.thread.count ThreadPool Thread Count ObservableCounter {threads} ThreadPool.ThreadCount Int32
process.runtime.dotnet.threadpool.completed.items.count ThreadPool Completed Work Item Count ObservableGauge {items} ThreadPool.CompletedWorkItemCount Int64
process.runtime.dotnet.threadpool.queue.length ThreadPool Queue Length ObservableCounter {items} ThreadPool.PendingWorkItemCount Int64
process.runtime.dotnet.active.timer.count Number of Active Timers ObservableCounter {timers} Timer.ActiveCount Int64

TODO: None of these should be ObservableCounter which is monotonically increasing.

TODO: [Question] any difference in Instrument Type between Int32 vs Int64?

RuntimeMetricsOptions.IsProcessEnabled

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.cpu.time Processor time of this process ObservableCounter s this.GetProcessorTimes IEnumerable<Measurement<double>>
process.cpu.count The number of available logical CPUs ObservableGauge {processors} Environment.ProcessorCount Int32
process.memory.usage The amount of physical memory in use ObservableGauge By Process.GetCurrentProcess().WorkingSet64 Int64
process.memory.virtual The amount of committed virtual memory ObservableGauge By Process.GetCurrentProcess().VirtualMemorySize64 Int64
  • this.GetProcessorTimes: IEnumerable<Measurement<double>> of
    Process.GetCurrentProcess().UserProcessorTime.TotalSeconds and
    Process.GetCurrentProcess().PrivilegedProcessorTime.TotalSeconds.

  • Process.TotalProcessorTime:
    Gets the total processor time for this process.

  • Process.PrivilegedProcessorTime:
    Gets the privileged processor time for this process.

TODO: Might as well use nanoseconds for this field. Ticks is Int64.
TimeSpan.TotalSeconds property
converts the value of this instance from ticks to seconds. This number might
include whole and fractional seconds.

RuntimeMetricsOptions.IsAssembliesEnabled

Name Description Instrument Unit (UCUM) ObserveValue ValueType
process.runtime.dotnet.assembly.count Number of Assemblies Loaded ObservableCounter {assemblies} AppDomain.CurrentDomain.GetAssemblies().Length Int32
  • AppDomain.GetAssemblies:
    Gets the number of the assemblies that have been loaded into the execution context
    of this application domain.

List of the metrics in RuntimeEventSource

The issue
provided that
.NET has several runtime metrics exposed via EventCounters https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/RuntimeEventSource.cs

Name Description Status
cpu-usage RuntimeEventSourceHelper.GetCpuUsage() TODO
working-set ((double)Environment.WorkingSet / 1_000_000) TODO
gc-heap-size ((double)GC.GetTotalMemory(false) / 1_000_000) Included
gen-0-gc-count GC.CollectionCount(0) Included
gen-1-gc-count GC.CollectionCount(1) Included
gen-2-gc-count GC.CollectionCount(2) Included
threadpool-thread-count ThreadPool.ThreadCount Included
monitor-lock-contention-count Monitor.LockContentionCount Included
threadpool-queue-length ThreadPool.PendingWorkItemCount Included
threadpool-completed-items-count ThreadPool.CompletedWorkItemCount Included
alloc-rate GC.GetTotalAllocatedBytes() Included
active-timer-count Timer.ActiveCount Included
gc-fragmentation this.fragmentation() Included
gc-committed ((double)GC.GetGCMemoryInfo().TotalCommittedBytes / 1_000_000) Included
exception-count Exception.GetExceptionCount() TODO
time-in-gc GC.GetLastGCPercentTimeInGC() TODO
gen-0-size GC.GetGenerationSize(0) TODO
gen-1-size GC.GetGenerationSize(1) TODO
gen-2-size GC.GetGenerationSize(2) TODO
loh-size GC.GetGenerationSize(3) TODO
poh-size GC.GetGenerationSize(4) TODO
assembly-count System.Reflection.Assembly.GetAssemblyCount() TODO
il-bytes-jitted System.Runtime.JitInfo.GetCompiledILBytes() Included
methods-jitted-count System.Runtime.JitInfo.GetCompiledMethodCount() Included
time-in-jit System.Runtime.JitInfo.GetCompilationTime().TotalMilliseconds Included

New Metrics

cpu-usage

Proposal to calculate CPU usage percentage with delta since initialization:
#207 (comment)

RuntimeEventSourceHelper.GetCpuUsage method in RuntimeEventSourceHelper.Windows.cs
has showed how to retrieve CPU times.
The Linux version in RuntimeEventSourceHelper.Unix.cs
is making a system call to get CPU utilization from Interop.Sys.GetCpuUtilization.

gen-0-size

Refer to discussion,
GC.GetGenerationSize() could be replaced with GC.GetGCMemoryInfo().GenerationInfo[i].SizeAfterBytes.

@Kielek
Copy link
Contributor

Kielek commented Jun 9, 2022

@xiang17, when you will have agreement here. It will be great to put the specification under https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/runtime-environment-metrics.md#runtime-environment-specific-metrics---processruntimeenvironment

Java is already documented here.

@cijothomas cijothomas reopened this Jun 16, 2022
@xiang17 xiang17 closed this as completed Aug 4, 2022
@xiang17
Copy link
Contributor Author

xiang17 commented Aug 4, 2022

The 1.0.0 version has been released: https://www.nuget.org/packages/OpenTelemetry.Instrumentation.Runtime

@dmitriy-shleht
Copy link

@xiang17 Please share this template

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:instrumentation.runtime Things related to OpenTelemetry.Instrumentation.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants