Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotnet8 metrics in otel format #1

Closed
wants to merge 73 commits into from
Closed

Dotnet8 metrics in otel format #1

wants to merge 73 commits into from

Conversation

lmolkova
Copy link
Owner

No description provided.

@lmolkova lmolkova changed the title Dotnet8 metrics Dotnet8 metrics in otel format Jul 21, 2023
@lmolkova lmolkova changed the title Dotnet8 metrics in otel format Dotnet8 metrics in otel format: current version Jul 22, 2023
Copy link
Collaborator

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually copying over the active comment threads from our word doc discussions. I used my judgement on which discussions were resolved and which were still active, but folks should feel free to add anything they think is relevant.

<!-- semconv metric.dotnet.http.client.connections.usage(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `http.client.connections.usage` | UpDownCounter | `{connection}` | Number of outbound HTTP connections that are currently active or idle on the client [1] |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Porting active comment threads from our word doc:

Sam Spencer

I am assuming this is a counter - it feels really strange with usage in the name.

Also with the dimensions, it feels like we are trying to do logging via metrics. As a connection is used it will rapidly flip between active and idle states. Are we getting too dimension happy?
July 18, 2023 at 10:02 AM

Sam Spencer

I'd prefer to have 2 different counters, one for connections, the other for idle connections.
July 18, 2023 at 10:38 AM

Noah Falk

I'll see if we get any feedback from OTel folks. This 'usage' convention is something their naming guidelines recommended and you can see very similar looking prior art for a connection pool in the database metrics: semantic-conventions/docs/database/database-metrics.md at e531541025992b68177a68b87628c5dc75c4f7d9 · open-telemetry/semantic-conventions (github.com)

The *.usage convention is documented here:

semantic-conventions/docs/general/metrics.md at main · open-telemetry/semantic-conventions · GitHub
July 18, 2023 at 2:44 PM

Miha Zupan

I second Sam's dislike of "usage". Would that force us to essentially log twice as much stuff (-1 idle, +1 active, request, -1 active, +1 idle)?

July 19, 2023 at 5:44 AM
Sam Spencer
Proposal: remove the state flag and usage from the name. Tracking idle connections shouldn't be that applicable, and is probably better done from logs.

July 19, 2023 at 10:50 AM
Miha Zupan
I personally don't mind having the idle connections info in metrics, I'd just prefer it be a separate counter.

July 19, 2023 at 12:39 PM

Noah Falk

@Liudmila Molkova - Do you have any feedback on this one? Personally I agree with Sam and Miha that I have an aethestic preference for multiple metrics rather than one 'usage' metric, but I did it this way because it appeared to be OTel's desired pattern and I'd prioritize consistency over my aesthetic preference. However if OTel conventions don't have a strong preference and either form is acceptable I'll split it again.

@miha Zupan - Ah I see now. The description of the current-connections made it sound like it was mutually exclusive with idle connections, but reviewing the code I see it includes both idle and non-idle connections. So yes, this scheme would have double the number of calls to instrument.Add() whenever a connection transitions between idle and non-idle. I'm assuming these transitions don't occur that often (<1K transitions/sec?) in which case perf overheads should be effectively zero. A call to Add() should be double digit ns.

July 19, 2023 at 11:44 PM

Liudmila Molkova

Problems with multiple counters is that if we ever need to support more states, we'll need to add new metrics.

Also, users will less less metric names and can discover available attributes and values as they go.

With multiple metrics, user will need to learn name of each of them. We also need to pay attention to keep attributes, implementation, and documentation in sync.

I think our aethestic preferences are not as important comparing to users ones. One metric allows them to split it and visualize it however they want Two different metric names don't let them do it.

(my preference is strongly on less instruments because of the consistency and flexibility).

July 21, 2023 at 6:11 PM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usage aspect will not be that useful in practice. Its going to generate a lot of noise.

With the connection duration counter below, I'd suggest removing this counter altogether.

Copy link
Owner Author

@lmolkova lmolkova Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, it seems the usage here represents a number of open connections - the connection.duration histogram does not provide the number of active/idle connections, it's only reported after the connection is closed, we can derive connection creation and close rate from it.

I.e. if we don't introduce it now:

  • we won't have a way to find how many connections are open at each moment
  • it seems common and useful to measure the number of open connections
  • we can still add it in .NET 9 if we realize it's useful

If we add it now, we can rename it to http.client.open_connections.

I believe knowing the number of idle connections from metrics is important - alerting on logs is a questionable and expensive thing to do. Retrieving logs when dealing with incidents in production is slow and unreliable - logs can be files on a file system.

Copy link
Owner Author

@lmolkova lmolkova Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I think usage is used incorrectly here - usage assumes there is a limit value reported and the sum of all state dimensions is equal to the limit. This is not the case here.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md#instrument-naming

Copy link
Owner Author

@lmolkova lmolkova Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this PR with http.client.open_connections - PTAL. We should also consider prefixing connection.state with something - I assume different connections can have different set of states and we should not mix them - e.g. there is no idle or active states here - https://learn.microsoft.com/dotnet/api/system.data.connectionstate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy not to use it :) Any of these seems fine to me:
http.client.open_connections
http.client.connections.count
http.client.connections

If I was picking my preference I'd probably go with the last one but its splitting hairs for me. I think its very reasonable to still have a state attribute that marks connections as idle or not.

@samsp-msft and @antonfirsov can of course say otherwise, but for me (and I am guessing for them), the aesthetic dislike really came from "usage" not having a clear and intuitive meaning. If I told someone "the value of the connection.usage metric is 26" they will probably give me weird confused looks. If I tell them "the value of the open_connections metric is 26" they probably have an intuitive understanding what that means.

Copy link
Owner Author

@lmolkova lmolkova Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had a chat with otel semconv WG members today and thinking about the following choice:

  1. *.connection_pool.usage + *.conneciton_pool.limit - which tells e.g. that there are 5 active and 25 idle connections in the connection pool of size 30

OR

  1. *.connections.count, or open_connection.count if there is no pool or limit is not known

I guess the dislike from usage came from not having a limit reported and not talking about the connection pool. To connection pool usage seems intuitive.

Let me know what your thoughts are and what would be a preferred option here

docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
| `_OTHER` | Any HTTP method that the instrumentation has no prior knowledge of. |
<!-- endsemconv -->

### Metric: `http.client.failed_requests`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Porting active comment threads from the word doc:

Anton Firszov

We just implemented a feature that reports various error reasons using an enum in our exceptions:https://github.com/dotnet/runtime/blob/c430570a01c103bc7f117be573f37d8ce8a129b8/src/libraries/System.Net.Http/src/System/Net/Http/HttpRequestError.cs#L9Would it make sense to include this as an attribute?

July 18, 2023 at 5:36 PM

Noah Falk

That seems reasonable to me. @Liudmila Molkova any thoughts on that one?

July 19, 2023 at 3:31 AM

Liudmila Molkova

Added dotnet.http.request.error in https://github.com/lmolkova/semantic-conventions/blob/dotnet8-metrics-proposed-diff/model/metrics/dotnet-common.yaml#L24

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented in more detail on PR #2. In short I think it is good for HttpClient but I am not in favor of extending it outside that.

docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-signalr-metrics.md Outdated Show resolved Hide resolved
**[2]:** **TODO: why not network.transport?**
<!-- endsemconv -->

## Metric: `signalr_http_transport.active_connections`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Porting active comment threads from the word doc:

Trask Stalnaker

I think existing otel sem conv conventions probably lean towards this being signalr_http_transport.connections.count (or possibly just signalr_http_transport.connections, see open-telemetry#50 https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#use-count-instead-of-pluralization

Otel does have db.client.connections.usage, but I think(?) this should be db.client.connections.count, and I just opened open-telemetry#201

Comparing this to otel's http.server.active_requests is also reasonable, but I think the above modeling is probably going to be more consistent long-term.

July 21, 2023 at 8:36 AM

Trask Stalnaker

I'm having second thoughts about this one already 🤔

July 21, 2023 at 8:45 AM

Trask Stalnaker

Opened open-telemetry#202

July 21, 2023 at 9:18 AM

Noah Falk

Yeah, I modeled it originally on http.server.active_requests, but I'd be just fine with current_connections. connections.count and connections are a little more ambiguous because there are two different counts that might refer to, the count of all connections ever made or the count of connections currently open.

July 21, 2023 at 5:36 PM

docs/dotnet/dotnet-signalr-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-aspnet-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-aspnet-metrics.md Outdated Show resolved Hide resolved
model/metrics/dotnet-http.yaml Outdated Show resolved Hide resolved
<!-- semconv metric.dotnet.http.client.connections.usage(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `http.client.connections.usage` | UpDownCounter | `{connection}` | Number of outbound HTTP connections that are currently active or idle on the client [1] |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usage aspect will not be that useful in practice. Its going to generate a lot of noise.

With the connection duration counter below, I'd suggest removing this counter altogether.

docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-http-metrics.md Show resolved Hide resolved
@noahfalk
Copy link
Collaborator

I reviewed all the changes you were proposing in #2. I think there are still a number of conversations that need to be responded to here too and that is in my todo list for today.

@lmolkova lmolkova changed the title Dotnet8 metrics in otel format: current version Dotnet8 metrics in otel format Jul 27, 2023
@lmolkova lmolkova force-pushed the dotnet8-metrics branch 4 times, most recently from 0fd3372 to f85bfa8 Compare July 27, 2023 23:52
docs/dotnet/dotnet-aspnet-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-aspnet-metrics.md Outdated Show resolved Hide resolved
docs/dotnet/dotnet-http-metrics.md Outdated Show resolved Hide resolved
<!-- toc -->

- [DNS metrics](#dns-metrics)
* [Metric: `dns.lookups.duration`](#metric-dnslookupsduration)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix pluralization :)


## DNS metrics

### Metric: `dns.lookups.duration`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Metric: `dns.lookups.duration`
### Metric: `dns.lookup.duration`

lmolkova and others added 20 commits October 3, 2023 09:37
…h `network.(peer|local).(address|port)`. (open-telemetry#342)

Co-authored-by: Liudmila Molkova <[email protected]>
@lmolkova
Copy link
Owner Author

closing this one in favor of open-telemetry#283

@lmolkova lmolkova closed this Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.