-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runtime metrics in System.Diagnostics.DiagnosticSource
#104680
Conversation
...braries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/RuntimeMetrics.cs
Show resolved
Hide resolved
@stevejgordon thanks a lot for doing this! we synced offline and made some common suggestions on open-telemetry/semantic-conventions#1035 (comment). Hope it'd help you focus on this PR and get it merged 🤞 |
@lmolkova, happy to contribute. Thanks for clarifying the suggestions in the semantic conventions. I've made those changes and aligned this PR with the updated naming. |
@carlossanlop @adamsitnik @ericstj I am seeing the cyclic errors |
I talked offline with @ericstj and looks we introduced the dependency on System.Diagnostics.Process which caused the cyclic dependency. @noahfalk @samsp-msft @lmolkova @reyang do you see any issue if we move this feature to
|
No issue from what I can see, thanks! |
Unfortunately, there might be an issue with this. The implementation relies on the initialisation of the static meter from the ctor of |
What is the impact of the dependency closure of this new meter on the size of trimmed or native-aot compiled applications? If this is going to be enabled by default, does it mean that this is introducing significant binary size regression for trimmer or native-aot compiled apps? I think we either need to avoid taking large dependency closure in the implementation and then it is ok to enable it by default; or it needs to opt-in and then the large dependency closure is ok (not ideal though). |
@jkotas we have the switch
Yes, I am trying to see if there are any other concerns that can prevent us moving it to the other library. For alternative way to create the Meter, I am thinking in the direction if we can keep creating the Meter in System.Diagnostics.DiagnosticSource but having instruments that are bringing the new dependencies be created from the extensions library. Of course we'll need to have a way to pick the meter instance from System.Diagnostics.DiagnosticSource. I am still thinking of other alternatives or viable solutions. |
I'm hopping between meetings and haven't had a chance to look more closely, but could we break the dependency by p/invoking to the relevant OS APIs directly? I believe on Windows its a call to GetProcessTimes() and on Linux a call to getrusage(). It doesn't seem like it would take much code to do that. |
I would aim for the former - avoid taking large dependency closure. In theory runtime metrics should be nothing more than some calls into low-level runtime APIs (GC, JIT, assemblies, threadpool) + a tiny bit of data from the OS (some cpu and memory usage info) |
@noahfalk there are more dependencies and not just Process one. Yes, process one is what is causing cycles, but I am not sure if we are ok for the other dependencies. <Reference Include="System.Diagnostics.Process" />
<Reference Include="System.Threading.ThreadPool" />
<Reference Include="System.ComponentModel.Primitives" /> |
Not dismissing the idea but want to point out that we have different implementations on Linux, Windows, Bsd, Mac and maybe more. @ericstj had another idea if this functionality can be exposed from Environment class (of course this will need a new API). |
Pinvoking could be done via some runtime checks - you wouldn't want to introduce runtime specific builds to this component. We try not to do that in our nuget packages. Especially those that are used in older frameworks. New API is blocker since that wouldn't be available on NetFx and older .NET versions. |
I forgot about that 😢 |
Is the meter something that can be registered with DI and then fetched later, or does that create a problem with then taking a DI dependency? |
You cannot access the DI from DiagnosticSource library level. In addition to that we need the scenario work outside containers (like console apps for instance). But you are thinking in the same direction I was thinking #104680 (comment). The Meter and instruments not having new dependencies can be created in DiagnosticSource. And in Extensions library we can start a listener which will give us the created instruments and from the instrument we can get the Meter and create more instruments that need the dependency. I know this is complicated, but it might work. The only issue I am seeing with that is we need a startup code to execute when the extensions library gets loaded. |
Yes, I agree. The immediate problem is caused by System.Diagnostics.Process that is a heavy weight component (last time I checked, any use of it brought in several MBs of code into native aot apps).
It is fine to have a solution with good performance characteristics that uses new APIs for current .NET, and a solution with worse performance characteristics that depends on pre-existing APIs for downlevel platforms and hack around the static dependency cycles using reflection. |
I also think it would be fine if this feature doesn't exist on downlevel platforms at all, or it doesn't show up on downlevel platforms in this release. If it weren't for the dependency on the Meter API I would have happily implemented this feature in System.Private.CoreLib where there would never be any downlevel support.
If we get rid of the System.Diagnostics.Process dependency that should also eliminate the System.ComponentModel.Primitives dependency. For threadpool I don't expect that is problematic as it is just a facade on S.P.C APIs. Downlevel we don't have to support the functionality. So I think Process is the only dependency that is problematic.
The implementation inside of System.Diagnostics.Process is more complex because it has to handle the case where the process being inspected is not the current one. When we limit to current process only the complexity is much lower. The existing cpu-usage counter calls to RuntimeEventSourceHelper.GetCpuUsage which only has two platform specific implementations: As best I can tell GetProcessTimes() and getrusage() covers all the cases where we'd need to make the data available, and then we've got cases like browser or WASI where the metric isn't going to be available. I'm certainly not opposed to adding a new public API, but I am guessing the quickest and easiest implementation would be a conditional check at runtime to determine the platform and then p/invoke to one of those two OS APIs. @stevejgordon - do you want to give the p/invoke approach a try for cpu.time? I believe you'd want something like: if(OperatingSystem.IsWindows())
{
// pinvoke to GetProcessTimes()
}
else if(!OperatingSystem.IsWasi() && !OperatingSystem.IsBrowser())
{
// pinvoke to getrusage()
}
else
{
// no support for cpu.time on this platform
} |
@noahfalk, sure! I'll have a look at this today. |
Ensure that each test only observes the expected Meter
@noahfalk, I have tweaked the tests to accommodate some of the platforms not running a forced GC. I also tweaked the existing tests that were failing to ensure they only observe their expected |
@stevejgordon I have changed the PR to be not draft. I think we can merge once we get the green CI and no more comments need to be addressed. I can enable the cpu time metrics with my changes later as I need more time to finish exposing the new Environment API. |
@stevejgordon there are some tests failing on WASM
Maybe we need to exclude running the metric tests on WASM. something like |
@tarekgh, sure. Will try that first thing tomorrow. I was expecting something like that might be needed somewhere. |
Just a heads up, I lost internet connectivity at my house. I'm doing what I can from my cell phone but it's a subpar UI experience to say the least :) |
Thanks @stevejgordon for your contribution. |
Happy to help! Thanks for the support as we got this over the line! |
Thanks Steve! |
Fixes #85372
This adds new runtime metrics to align with the proposed semantic conventions. I am opening this as a draft until the naming is finalised in the OTel semantic conventions.
The metrics are essentially a port of the existing OTel contrib runtime metrics. Small changes have been made to optimise the performance and align the naming with those being proposed and discussed over in the semantic conventions repo.
cc @tarekgh and @noahfalk