-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093
Comments
@bogdandrutu @tigrannajaryan Would like to put this in a GA milestone since changing this in the future will break metric names being exported. |
@jrcamp I think we should send metrics directly to the backend not through the pipeline, because if there is a problem in the pipeline these metrics do not get to the backend. |
I think the current approach is a "big" hack added long time ago. OpenCensus (which is currently used) can export directly to different backends and don't need this hack to export prometheus then scrape. |
@ccaraman I think this falls under self-observability, do you want to take it? |
To summarize, we have two problems here:
I think this issue tries to fix the first problem initially, then we can fix the second problem. @jrcamp am I correct? |
Note that the backend here may also differ from the "normal" one, e.g. could be using completely different exporter, or same exporter with a different config. Also, the current Prometheus endpoint actually allows running a separate Collector instance for scraping and forwarding "self-observability" metrics, i.e. making isolation possible. The problem of course is that now you also need to monitor that "separate Collector". Relevant issue for adding latency metrics: #542, which is blocked by the current approach of reusing the internal pipeline for self-observability. |
Is there any benefit in sending these metrics as Prometheus to a separate collector? Why not use the native OTLP format? |
@bogdandrutu though can we address the second point by just using a dedicated pipeline? If the main pipeline is having issues (getting backed up, dropping data, etc.) it shouldn't affect the other pipeline (assuming the user hasn't configured it to). |
One minor potential benefit is that since Prometheus uses a "pull" model, the main collector won't have a dependency on a "separate" collector monitoring it. With a "push" model in OTLP, the main collector has to be configured to forward its self-observability metrics somewhere. But that's probably not critical. |
Due to lack of time I am removing this for 1.0. If there are any objections please speak. |
* Bump github.com/google/go-cmp from 0.5.1 to 0.5.2 Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.1 to 0.5.2. - [Release notes](https://github.com/google/go-cmp/releases) - [Commits](google/go-cmp@v0.5.1...v0.5.2) Signed-off-by: dependabot[bot] <[email protected]> * Auto-fix go.sum changes in dependent modules * Auto-fix go.sum changes in dependent modules Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Tyler Yahn <[email protected]>
…ry#1093) Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.1.1 to 1.3.1. - [Release notes](https://github.com/hashicorp/vault/releases) - [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md) - [Commits](hashicorp/vault@v1.1.1...v1.3.1) --- updated-dependencies: - dependency-name: github.com/hashicorp/vault/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@codeboten should we close this in favor of your issue about making the Collector observable? |
Closing in favor of #7532 |
Is your feature request related to a problem? Please describe.
Internal OT metrics are currently exposed through Prometheus endpoint which must then be scraped. This is more inefficient than if the metrics were sent directly through the pipeline. In addition, context is lost since OT has additional concepts like resource labels vs metric labels. The metric names are also transformed by being sent through Prometheus instead of directly through the pipeline.
This also results in additional complexity in the configuration since a Prometheus scraper must be configured.
Describe the solution you'd like
Be able to send internal metrics through the pipeline in OTLP format.
The text was updated successfully, but these errors were encountered: