Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename metrics to match OpenTelemetry/OpenMetrics naming conventions #60

Merged
merged 4 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ rule_files:

### How it works

This file sets up a number of recording and alerting rules that are dormant by default and are only enabled when the autometrics libraries product metrics with special labels: `function_calls_count{objective_name="", objective_percentile=""}` or `function_calls_duration_bucket{objective_name="", objective_latency_threshold="", objective_percentile=""}`.
This file sets up a number of recording and alerting rules that are dormant by default and are only enabled when the autometrics libraries product metrics with special labels: `function_calls_total{objective_name="", objective_percentile=""}` or `function_calls_duration_seconds_bucket{objective_name="", objective_latency_threshold="", objective_percentile=""}`.

To read more details about the label tricks we use to make these rules work across autometrics-instrumented projects, see [An adventure with SLOs, generic Prometheus alerting rules, and complex PromQL queries](https://fiberplane.com/blog/an-adventure-with-slos-generic-prometheus-alerting-rules-and-complex-promql-queries).

Expand Down
24 changes: 17 additions & 7 deletions SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ It aims to describe the full feature set of the Autometrics libraries, but it ma
- [Metric Collection Libraries](#metric-collection-libraries)
- [Exemplars (Optional)](#exemplars-optional)
- [Metrics](#metrics)
- [`function.calls.count`](#functioncallscount)
- [`function.calls`](#functioncalls)
- [`function.calls.duration`](#functioncallsduration)
- [`build_info`](#build_info)
- [`function.calls.concurrent`](#functioncallsconcurrent)
Expand Down Expand Up @@ -45,7 +45,7 @@ Libraries SHOULD expose functionality to create objectives within the source cod

Objectives can relate to functions' success rate and/or latencies.

Success rate objectives add the [`objective.name`](#objectivename) and [`objective.percentile`](#objectivepercentile) labels to the [`function.calls.count`](#functioncallscount) metric.
Success rate objectives add the [`objective.name`](#objectivename) and [`objective.percentile`](#objectivepercentile) labels to the [`function.calls`](#functioncalls) metric.

Latency objectives add the [`objective.name`](#objectivename), [`objective.percentile`](#objectivepercentile), and [`objective.latency_threshold`](#objectivelatency_threshold) labels to the [`function.calls.duration`](#functioncallsduration) metric.

Expand All @@ -67,34 +67,42 @@ Libraries SHOULD support extracting the `trace_id` field and attaching it as an

Autometrics uses the [OpenTelemetry Metric Semantic Conventions](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/README.md) for naming metrics, including using `.`'s as separators.

When the metrics are exported to Prometheus, all dot (`.`) separators are replaced by underscores (`_`).
When the metrics are exported to Prometheus, all dot (`.`) separators are replaced by underscores (`_`). Suffixes are appended where required by Prometheus/OpenMetrics.

### `function.calls.count`
### `function.calls`

> **Prometheus Name:** `function_calls_total`
>
> **Required Labels:** [`function`](#function), [`module`](#module), [`service.name`](#servicename), [`result`](#result), [`caller`](#caller)
>
> **Additional Labels** (if a success rate [objective](#service-level-objectives-slos) is attached to the given function): [`objective.name`](#objectivename) and [`objective.percentile`](#objectivepercentile)

**Note:** there is an [open discussion](https://github.com/orgs/autometrics-dev/discussions/4#discussioncomment-5839198) about changing this metric name to `function.calls` or `function.calls.total`.

This metric is a 64-bit monotonic counter that tracks the number of times a given function was invoked.

When this metric is exported to Prometheus, its name SHOULD be `function_calls_total`, because Prometheus/OpenMetrics specifies that counters SHOULD have the `_total` suffix. Note that library authors may need to append the suffix because not all Prometheus client libraries or exporters will do so.

If possible, libraries SHOULD start this counter off at zero (by incrementing the counter by 0) in order to expose the names of instrumented functions to visualization tools that use the metrics. Libraries SHOULD use as many of the labels as possible for the initial call to increment by zero, including those related to objectives and setting `result="ok"`.

### `function.calls.duration`

> **Prometheus Name:** `function_calls_duration_seconds`
>
> **Required Labels:** [`function`](#function), [`module`](#module), [`service.name`](#servicename)
>
> **Additional labels** (if a latency [objective](#service-level-objectives-slos) is attached to the given function): [`objective.name`](#objectivename), [`objective.percentile`](#objectivepercentile), [`objective.latency_threshold`](#objectivelatency_threshold)

This is a 64-bit floating point histogram that tracks the duration or latency of function calls.

It MUST track the duration in seconds (**not** milliseconds).
It MUST track the duration in seconds (**not** milliseconds). Libraries using OpenTelemetry SHOULD set the units in the resource metadata.

Libraries SHOULD support the [default OpenTelemetry histogram buckets](https://opentelemetry.io/docs/reference/specification/metrics/sdk/#histogram-aggregations) as label values. Libraries MAY allow users to specify custom histogram buckets.

When this metric is exported to Prometheus, its name SHOULD be `function_calls_duration_seconds`, because Prometheus/OpenMetrics specifies that metrics SHOULD include their units. Note that library authors may need to append the unit suffix because not all Prometheus client libraries or exporters will do so.

### `build_info`

> **Prometheus Name:** `build_info`
>
> **Required Labels:** [`version`](#version), [`commit`](#commit), [`branch`](#branch), [`service.name`](#servicename)

This is a gauge or up/down counter.
Expand All @@ -103,6 +111,8 @@ It MUST always have the value of `1.0`.

### `function.calls.concurrent`

> **Prometheus Name:** `function_calls_concurrent`
>
> **Required Labels:** [`function`](#function), [`module`](#module), [`service.name`](#servicename)

This metric is optional. Libraries MAY provide an option to the user for enabling this on a per-function basis.
Expand Down