Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheusreceiver: possible incorrect metric name for cumulative metrics #13117

Closed
newly12 opened this issue Aug 9, 2022 · 7 comments · Fixed by #13213
Closed

prometheusreceiver: possible incorrect metric name for cumulative metrics #13117

newly12 opened this issue Aug 9, 2022 · 7 comments · Fixed by #13213
Labels
bug Something isn't working exporter/prometheus priority:p2 Medium

Comments

@newly12
Copy link
Contributor

newly12 commented Aug 9, 2022

Describe the bug
given metrics like the following, exported metrics will only be envoy_cluster_upstream_rq

# TYPE envoy_cluster_upstream_rq counter
envoy_cluster_upstream_rq{envoy_response_code="200",envoy_cluster_name="local_envoy_admin"} 8130
envoy_cluster_upstream_rq{envoy_response_code="200",envoy_cluster_name="local_service83"} 8265
envoy_cluster_upstream_rq{envoy_response_code="200",envoy_cluster_name="sds_server_uds"} 1
envoy_cluster_upstream_rq{envoy_response_code="503",envoy_cluster_name="sds_server_uds"} 2

# TYPE envoy_cluster_upstream_rq_total counter
envoy_cluster_upstream_rq_total{envoy_cluster_name="local_envoy_admin"} 8131
envoy_cluster_upstream_rq_total{envoy_cluster_name="local_service80"} 0
envoy_cluster_upstream_rq_total{envoy_cluster_name="local_service81"} 0
envoy_cluster_upstream_rq_total{envoy_cluster_name="local_service82"} 0
envoy_cluster_upstream_rq_total{envoy_cluster_name="local_service83"} 8265
envoy_cluster_upstream_rq_total{envoy_cluster_name="sds_server_uds"} 1

when envoy_cluster_upstream_rq appears first, the family name will be envoy_cluster_upstream_rq, normalizeMetricName returns the same metric name, later when envoy_cluster_upstream_rq_total comes, after normalized, it will be envoy_cluster_upstream_rq, thus they will be grouped into the same metric family

Steps to reproduce
test prometheus receiver with above metrics

What did you expect to see?
4 envoy_cluster_upstream_rq metrics and 6 envoy_cluster_upstream_rq_total metrics

What did you see instead?
10 envoy_cluster_upstream_rq metrics.

What version did you use?
Version: (e.g., v0.4.0, 1eb551b, etc)

What config did you use?
Config: (e.g. the yaml config file)

Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Additional context
Add any other context about the problem here.

@newly12 newly12 added the bug Something isn't working label Aug 9, 2022
@newly12
Copy link
Contributor Author

newly12 commented Aug 10, 2022

by looking at the code, I understand the need to trim necessary suffix so to group metrics together for histogram & summary, not quite sure why sum/counter type is needed?

@github-actions
Copy link
Contributor

Pinging code owners: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax dmitryax added the priority:p2 Medium label Aug 12, 2022
@bogdandrutu
Copy link
Member

@dashpole is this a correct OpenMetrics input? Or a Prometheus? Based on https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#counter-1 we should have something like:

# TYPE envoy_cluster_upstream_rq counter
envoy_cluster_upstream_rq_total{envoy_response_code="200",envoy_cluster_name="local_envoy_admin"} 8130

@dashpole
Copy link
Contributor

dashpole commented Sep 7, 2022

@bogdandrutu what you pasted is valid for OpenMetrics, but not valid for prometheus. I'm not aware of a way for us to know which format (OM vs prom) the scrape was in, so we have to choose which to handle correctly:

OpenMetrics (only one valid option):

# TYPE envoy_cluster_upstream_rq counter
envoy_cluster_upstream_rq_total{envoy_response_code="200",envoy_cluster_name="local_envoy_admin"} 8130

Vs Prometheus:

# TYPE envoy_cluster_upstream_rq counter
envoy_cluster_upstream_rq{envoy_response_code="200",envoy_cluster_name="local_envoy_admin"} 8130
# TYPE envoy_cluster_upstream_rq_total counter
envoy_cluster_upstream_rq_total{envoy_response_code="200",envoy_cluster_name="local_envoy_admin"} 8130

Note that # TYPE envoy_cluster_upstream_rq counter is valid in prometheus, but is strongly discouraged, as it doesn't abide by Prometheus naming conventions (counters should end in _total). Since we had to choose between supporting # TYPE envoy_cluster_upstream_rq counter in Prometheus and supporting the OpenMetrics naming case, we chose to support the OM naming convention.

@bogdandrutu
Copy link
Member

Thanks for the explanation, is anyone in the Prometheus ecosystem that can tell us their opinion?

@dashpole
Copy link
Contributor

dashpole commented Sep 8, 2022

I'll raise it for discussion at the prometheus working group

@dashpole
Copy link
Contributor

Discussed at the prometheus WG today:

  • We can ask for OM vs Prometheus to be passed to appenders. Maybe in the append context? (tag RichiH, codesome)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus priority:p2 Medium
Projects
None yet
5 participants