[FEATURE] Rename metrics name to comply with feature flag semconv #712

Kavindu-Dodan · 2023-06-19T17:27:39Z

Requirements

Background

flagd comes with out of the box metrics support [1]. It currently expose the following metric names,

http_request_duration_seconds
http_response_size_bytes
impressions
reasons

While these metrics serve a good purpose, the names do not comply with semantic conventions. For comparison, see how metrics are named for the metrics hook [2] (ex:- feature_flag.evaluation_requests_total, feature_flag.evaluation_success_total)

Proposal (updated 27/06/2023)

This issue focuses on renaming flagd metrics to comply with semantic conventions.

Common rules

Instrumentation Scope: flagd
Version: current flagd version
Feature flag specific metric name must start with feature_flag prefix so that monitoring systems can aggregate feature flag related metrics
Feature flag specific metric name must contain flagd phrase so that it is clear the metric belongs to flagd system

With this background, consider the following proposals,

Old	New	Unit	Refer
http_request_duration_seconds	http.server.duration	S	[3]
http_response_size_bytes	http.server.response.size	By	[4]
impressions	feature_flag.flagd.impression	{impression}	[5]
reasons	feature_flag.flagd.evaluation.reason	{reason}	[5]

[1] - https://github.com/open-feature/flagd/blob/main/core/pkg/telemetry/metrics.go
[2] - https://github.com/open-feature/go-sdk-contrib/blob/main/hooks/open-telemetry/README.md#metric-hook
[3] - https://github.com/open-telemetry/semantic-conventions/blob/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions/http-metrics.md#metric-httpserverduration
[4] - https://github.com/open-telemetry/semantic-conventions/blob/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions/http-metrics.md#metric-httpserverrequestsize
[5] - https://github.com/open-telemetry/semantic-conventions/tree/bef5a68c2f580e48334bfb2a118345808b2e2125/specification/metrics/semantic_conventions#pluralization

The text was updated successfully, but these errors were encountered:

Kavindu-Dodan · 2023-06-19T17:28:12Z

@thisthat @toddbaert @beeme1mr appreciate feedback on this proposal

thisthat · 2023-06-19T20:01:38Z

Great summary @Kavindu-Dodan - the proposal makes a lot of sense. However, OTel suggests adding count to the namespace for pluralization, e.g., feature_flag.flagd.impression.count. This would require also to adapt the hooks tho 🤔

pirgeo · 2023-06-20T13:20:26Z

Hi! Here are a few thoughts, mainly looking at the spec you linked above:

This is not a "hard" requirement, but personal taste: I think I prefer featureflag. over feature_flag., but I couldn't find any guidance in the Otel spec. All the other top-level namespaces are a single word (or even abbreviations, like db.).
For HTTP, the semantic conventions use the dot instead of the underscore, e.g., http.server.response.size. Maybe moving to feature_flag.flagd.http.response.size would be more in line with existing semantic conventions.
Here is a PR that intends to only add .count for UpDownCounters. Not sure if the counters in question are UpDownCounters or "just up" Counters, but it might be worth following that guidance there. The fact that the type of the metric is "Count" already lets us determine that this metric is a counter, we don't necessarily need to pack it into the name. The whole point of this discussion was Prometheus compatibility, if you add .count out of the box you might end up with _count_total for the Prometheus exporter, which is... not ideal.

beeme1mr · 2023-06-20T16:07:21Z

Thanks @pirgeo! I would prefer to use feature_flag as a prefix to match what has been defined in the semantic convention for traces but I agree with the other points.

pirgeo · 2023-06-21T06:18:38Z

Sounds good! I neither found any guidance on the spec nor do I have a strong opinion on this one.

joaopgrassi · 2023-06-21T08:01:30Z

Apart from what @pirgeo already mentioned, here are my suggestions:

About the units: It's a bit hidden but there's some extra guidance here: https://github.com/open-telemetry/semantic-conventions/blob/main/specification/metrics/semantic_conventions/README.md#instrument-units

feature_flag.flagd.http_request_time: Should be s
feature_flag.flagd.impression_count and feature_flag.flagd.evaluation_reason_count: I think those are under the "default" unit 1:

Instruments for utilization metrics (that measure the fraction out of a total) are dimensionless and SHOULD use the default unit 1 (the unity).

If you want to capture more meaning, you could use {impression} and {reason} as stated in the spec:

Instruments that measure an integer count of something SHOULD only use annotations with curly braces to give additional meaning without the leading default unit (1). For example, use {packet}, {error}, {fault}, etc.

feature_flag.flagd.http_request_time

Is there a reason why the metric changed to "request_time"? I think the old name actually aligned better with the current HTTP semconv, such as http.server.duration.

Kavindu-Dodan · 2023-06-21T20:18:32Z

@thisthat @pirgeo @beeme1mr & @joaopgrassi for the feedback Thank you very much for taking the time to give your opinions.

The fact that the type of the metric is "Count" already lets us determine that this metric is a counter, we don't necessarily need to pack it into the name

Thank you for this input

Is there a reason why the metric changed to "request_time"? I think the old name actually aligned better with the current HTTP semconv, such as http.server.duration.

Should be an oversight and the original description is meant to be changed.

If you want to capture more meaning, you could use {impression} and {reason} as stated in the spec:

Good point. I assume this allows us to remove the count suffix

To summarise, can we agree on the following updated proposal?

A new addition to the common rule,

Adopt dot(.) notation instead of underscore (_)

And updated metrics chart,

Old	New	Unit
http_request_duration_seconds	feature_flag.flagd.http.request.duration	s
http_response_size_bytes	feature_flag.flagd.http.response.size	By
impressions	feature_flag.flagd.impression	{impression}
reasons	feature_flag.flagd.evaluation.reason	{reason}

Let me know your thoughts and if we agree, I will update the issue description and continue with the implementation.

beeme1mr · 2023-06-21T20:32:30Z

Are the metrics feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size only captured during flag evaluation? If so, then this looks good to me.

pirgeo · 2023-06-22T06:38:14Z

Just for clarification: Do feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size refer to the requests that flagd sends out in order to get feature flag settings?

Will the feature_flag.flagd.evaluation.reason have an attribute that states the actual reason? Like: feature_flag.flagd.evaluation.reason with an attribute reason=user-changed-flag or something? In that case, I wonder if dropping the reason from the metric name would be a good idea, since you are basically counting evaluations, and can then split by reason. With how it is currently, you are counting reasons. This might be intended, but I know too little about the underlying data. If the reason is something that is highly changeable and for a user to set, keeping it as is would be good. If there is a predefined set of valid reasons, going with the dimension might be a better idea (but again: take it with a pinch of salt, my feature flag knowledge is shallow).

Finally: What are the Otel datatypes of these? I assume histograms for the first two, and counts for the last two? Both of these should only ever go up, I assume.

joaopgrassi · 2023-06-22T07:01:51Z

feature_flag.flagd.impression -> I think this can now be pluralized feature_flag.flagd.impressions, since you have the unit {impression} From the spec:

Metric names SHOULD NOT be pluralized, unless the value being recorded represents discrete instances of a countable quantity. Generally, the name SHOULD be pluralized only if the unit of the metric in question is a non-unit (like {fault} or {operation}).

Same for the other (depending on how you address @pirgeo comment above)

About the request.duration and response.size I'm not sure if we need to redefine them under the feature_flag namespace. Couldn't Flagd simply adopt the HTTP semantic conventions for these, since they are basically the same? That's what HTTP client instrumentations for example do.

Kavindu-Dodan · 2023-06-22T22:14:56Z

Just for clarification: Do feature_flag.flagd.http.request.duration and feature_flag.flagd.http.response.size refer to the requests that flagd sends out in order to get feature flag settings?

No, this correlates to the flag evaluation time of flagd & the size of the payload of the evaluation

And regarding reason vs impression, I think they have value on their own. impression represents successful falg evaluations where reason represent anything (both success or error) situations. Besides, OpenFeature define a set of reasons to correlate with this metric [1].

What are the Otel datatypes of these? I assume histograms for the first two, and counts for the last two? Both of these should only ever go up, I assume.

Yes, this is correct. http dureation & response size use histograms and reason & impression use counters

feature_flag.flagd.impression -> I think this can now be pluralized feature_flag.flagd.impressions, since you have the unit {impression} From the spec:

Thank you, I will update my proposal

About the request.duration and response.size I'm not sure if we need to redefine them under the feature_flag namespace. Couldn't Flagd simply adopt the HTTP semantic conventions for these, since they are basically the same? That's what HTTP client instrumentations for example do.

Good suggesion, are you proposing to use flagd namespace and use something like flagd.http.server.duration ? If this is semconv compliant, I think we can choose this naming convension.

[1] - https://github.com/open-feature/spec/blob/main/specification/types.md#resolution-details

joaopgrassi · 2023-06-23T09:41:50Z

Good suggesion, are you proposing to use flagd namespace and use something like flagd.http.server.duration ? If this is semconv compliant, I think we can choose this naming convension.

Ah no, I'm proposing that Flagd re-uses the HTTP metrics defined in https://github.com/open-telemetry/semantic-conventions/blob/main/specification/metrics/semantic_conventions/http-metrics.md#semantic-conventions-for-http-metrics.

Not sure if that would be client or server but the goal is to use these metrics for all HTTP communications. So, flagd would emit http.server.duration and http.server.response.size. That is what all other http servers will be reporting in their own implementations. For ex, web frameworks (Spring, ASP.NET, Gin, Laravel, etc) would report these.

Essentially then you end up with only flagd specifc metrics such as impressions and reasons

Kavindu-Dodan · 2023-06-23T15:10:56Z

Not sure if that would be client or server but the goal is to use these metrics for all HTTP communications. So, flagd would emit http.server.duration and http.server.response.size

Wouldn't this create metric name conflicts at OTel backend (ex:- Prometheus) 🤔 Or do they differentiate metrics based on service naming and other attributes ?

joaopgrassi · 2023-06-26T14:06:14Z

When converting from OTLP to Prometheus, the spec offers guidance that should help in this case. Instrumentation scope and version become labels while resource attributes (e.g., service.name) become this target_info metric

So it should work just as fine - they will split by say instrumentation scope = flagd and get the metrics they are after. @pirgeo maybe you have more input here, in case I missed something?

For Dynatrace for example, instrumentation scope+version and a few resource attributes (like service.name) are also added automatically to all ingested metrics as dimensions.

Kavindu-Dodan · 2023-06-27T16:26:29Z

@joaopgrassi thank you for the explanation. I think my doubt is clear and I learned a few more OTel details thanks to this discussion.

For the reference of others, Instrumentation Scope [1] is already set to flagd and this is done at the meter setup phase [2]. We can further add the version as part of this improvement.

With this agreement and understanding, let's agree on the following metric naming & unit changes,

Instrumentation Scope: flagd
Version: current flagd version

Old	New	Unit
http_request_duration_seconds	http.server.duration	s
http_response_size_bytes	http.server.response.size	By
impressions	feature_flag.flagd.impression	{impression}
reasons	feature_flag.flagd.evaluation.reason	{reason}

[1] - https://opentelemetry.io/docs/specs/otel/glossary/#instrumentation-scope
[2] - https://github.com/open-feature/flagd/blob/main/core/pkg/telemetry/metrics.go#L126

## This PR Fixes #712 Refer the following table for metrics renames and their units, | Old | New | Unit | |---|---|---| | http_request_duration_seconds | http.server.duration | s | | http_response_size_bytes | http.server.response.size | By | | http_requests_inflight |http.server.active_requests | {request} | | impressions | feature_flag.flagd.impression| {impression} | | reasons | feature_flag.flagd.evaluation.reason| {reason} | ### Validation This change was validated against an OTel setup. Consider following screen captures, ![Screenshot 2023-07-04 at 1 16 02 PM](https://github.com/open-feature/flagd/assets/8186721/5aba3103-e3fc-42a7-a396-c78999d22b05) ![Screenshot 2023-07-04 at 1 15 41 PM](https://github.com/open-feature/flagd/assets/8186721/0ea196f8-fcb4-4e8c-86eb-c6774de3e2f3) ![Screenshot 2023-07-04 at 1 14 53 PM](https://github.com/open-feature/flagd/assets/8186721/c285d78d-525a-4e40-bfe2-895a85d96cd6) ![Screenshot 2023-07-04 at 1 14 40 PM](https://github.com/open-feature/flagd/assets/8186721/e826ed8c-6056-4fd4-b206-d76592d8d484) ![Screenshot 2023-07-04 at 1 14 18 PM](https://github.com/open-feature/flagd/assets/8186721/30c959af-197b-405c-b303-c4dd59807311) Signed-off-by: Kavindu Dodanduwa <[email protected]>

Kavindu-Dodan added enhancement New feature or request Needs Triage This issue needs to be investigated by a maintainer labels Jun 19, 2023

Kavindu-Dodan self-assigned this Jun 29, 2023

Kavindu-Dodan removed the Needs Triage This issue needs to be investigated by a maintainer label Jun 29, 2023

Kavindu-Dodan mentioned this issue Jul 4, 2023

feat!: rename metrics and service #730

Merged

Kavindu-Dodan closed this as completed in #730 Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Rename metrics name to comply with feature flag semconv #712

[FEATURE] Rename metrics name to comply with feature flag semconv #712

Kavindu-Dodan commented Jun 19, 2023 •

edited

Loading

Kavindu-Dodan commented Jun 19, 2023

thisthat commented Jun 19, 2023

pirgeo commented Jun 20, 2023

beeme1mr commented Jun 20, 2023

pirgeo commented Jun 21, 2023

joaopgrassi commented Jun 21, 2023

Kavindu-Dodan commented Jun 21, 2023 •

edited

Loading

beeme1mr commented Jun 21, 2023

pirgeo commented Jun 22, 2023

joaopgrassi commented Jun 22, 2023 •

edited

Loading

Kavindu-Dodan commented Jun 22, 2023

joaopgrassi commented Jun 23, 2023 •

edited

Loading

Kavindu-Dodan commented Jun 23, 2023

joaopgrassi commented Jun 26, 2023 •

edited

Loading

Kavindu-Dodan commented Jun 27, 2023 •

edited

Loading

[FEATURE] Rename metrics name to comply with feature flag semconv #712

[FEATURE] Rename metrics name to comply with feature flag semconv #712

Comments

Kavindu-Dodan commented Jun 19, 2023 • edited Loading

Requirements

Kavindu-Dodan commented Jun 19, 2023

thisthat commented Jun 19, 2023

pirgeo commented Jun 20, 2023

beeme1mr commented Jun 20, 2023

pirgeo commented Jun 21, 2023

joaopgrassi commented Jun 21, 2023

Kavindu-Dodan commented Jun 21, 2023 • edited Loading

beeme1mr commented Jun 21, 2023

pirgeo commented Jun 22, 2023

joaopgrassi commented Jun 22, 2023 • edited Loading

Kavindu-Dodan commented Jun 22, 2023

joaopgrassi commented Jun 23, 2023 • edited Loading

Kavindu-Dodan commented Jun 23, 2023

joaopgrassi commented Jun 26, 2023 • edited Loading

Kavindu-Dodan commented Jun 27, 2023 • edited Loading

Kavindu-Dodan commented Jun 19, 2023 •

edited

Loading

Kavindu-Dodan commented Jun 21, 2023 •

edited

Loading

joaopgrassi commented Jun 22, 2023 •

edited

Loading

joaopgrassi commented Jun 23, 2023 •

edited

Loading

joaopgrassi commented Jun 26, 2023 •

edited

Loading

Kavindu-Dodan commented Jun 27, 2023 •

edited

Loading