Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename metrics to be in line with OTel + Prometheus conventions #21

Open
6 of 9 tasks
emschwartz opened this issue May 23, 2023 · 2 comments
Open
6 of 9 tasks

Rename metrics to be in line with OTel + Prometheus conventions #21

emschwartz opened this issue May 23, 2023 · 2 comments

Comments

@emschwartz
Copy link
Contributor

emschwartz commented May 23, 2023

As discussed here, OpenTelemetry is changing its semantic conventions to remove the .count suffix. Also, as discussed here, the histogram should include the units in the name for Prometheus.

We should update all of the libraries such that:

  • The counter is renamed from function.calls.count to function.calls
  • The histogram should include the unit as seconds (in the OpenTelemetry metadata or in the name for Prometheus)
  • When the metrics are exported to Prometheus, we should ensure that the counter is renamed to function_calls_total and the histogram is renamed to function_calls_duration_seconds
  • All queries should be updated to use the function_calls_total and function_calls_duration_seconds name. We can give people time to update things by making the alerting rules and dashboards support the new and old names using regular expressions.

TODO:

@emschwartz emschwartz changed the title Rename metric to function.calls Rename metrics Jun 16, 2023
@emschwartz emschwartz changed the title Rename metrics Rename metrics to be in line with OTel + Prometheus conventions Jun 16, 2023
@gagbo
Copy link
Member

gagbo commented Jul 24, 2023

The error rate SLO expression still expects _count to be part of the metric name

(sum by (objective_name, objective_percentile) (rate({__name__=~"function_calls_count(?:_total)?",objective_percentile="90",result="error"}[5m])))

An extra change is needed in the generated rules file to make the alerts work once libraries make the change to support function_calls_total naming.

Also, the histogram queries for the latency SLO haven't been renamed either

(sum by (objective_name, objective_percentile) (rate(function_calls_duration_count{objective_percentile="90"}[3d])) - (sum by (objective_name, objective_percentile) (

So function_calls_duration_seconds is unsupported as well for now

@brettimus
Copy link
Contributor

maybe we should fix the queries in https://github.com/autometrics-dev/autometrics-shared/pull/60/files and merge it? wasn't sure if that PR was unmerged for a reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants