Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093

Closed
jrcamp opened this issue Jun 5, 2020 · 12 comments
Closed
Labels
enhancement New feature or request

Comments

@jrcamp
Copy link
Contributor

jrcamp commented Jun 5, 2020

Is your feature request related to a problem? Please describe.
Internal OT metrics are currently exposed through Prometheus endpoint which must then be scraped. This is more inefficient than if the metrics were sent directly through the pipeline. In addition, context is lost since OT has additional concepts like resource labels vs metric labels. The metric names are also transformed by being sent through Prometheus instead of directly through the pipeline.

This also results in additional complexity in the configuration since a Prometheus scraper must be configured.

Describe the solution you'd like
Be able to send internal metrics through the pipeline in OTLP format.

@jrcamp
Copy link
Contributor Author

jrcamp commented Jun 5, 2020

@bogdandrutu @tigrannajaryan Would like to put this in a GA milestone since changing this in the future will break metric names being exported.

@bogdandrutu bogdandrutu added this to the Beta 0.4 milestone Jun 5, 2020
@bogdandrutu
Copy link
Member

@jrcamp I think we should send metrics directly to the backend not through the pipeline, because if there is a problem in the pipeline these metrics do not get to the backend.

@bogdandrutu
Copy link
Member

I think the current approach is a "big" hack added long time ago. OpenCensus (which is currently used) can export directly to different backends and don't need this hack to export prometheus then scrape.

@jrcamp jrcamp changed the title Send internal metrics through the pipeline instead of scraping Don't send internal metrics using scraping Jun 5, 2020
@jrcamp
Copy link
Contributor Author

jrcamp commented Jun 5, 2020

@ccaraman I think this falls under self-observability, do you want to take it?

@bogdandrutu
Copy link
Member

To summarize, we have two problems here:

  1. We expose internal metrics to a prometheus endpoint that we self-scrape and add data to the pipeline.
  2. The data are sent to the backend using the internal pipeline, which smells bad because we send monitoring data about the pipeline using the same pipeline, so in case of a problem with a pipeline we cannot see monitoring data.

I think this issue tries to fix the first problem initially, then we can fix the second problem. @jrcamp am I correct?

@nilebox
Copy link
Member

nilebox commented Jun 9, 2020

we should send metrics directly to the backend not through the pipeline

Note that the backend here may also differ from the "normal" one, e.g. could be using completely different exporter, or same exporter with a different config.

Also, the current Prometheus endpoint actually allows running a separate Collector instance for scraping and forwarding "self-observability" metrics, i.e. making isolation possible. The problem of course is that now you also need to monitor that "separate Collector".

Relevant issue for adding latency metrics: #542, which is blocked by the current approach of reusing the internal pipeline for self-observability.

@jrcamp
Copy link
Contributor Author

jrcamp commented Jun 9, 2020

Also, the current Prometheus endpoint actually allows running a separate Collector instance for scraping and forwarding "self-observability" metrics, i.e. making isolation possible. The problem of course is that now you also need to monitor that "separate Collector".

Is there any benefit in sending these metrics as Prometheus to a separate collector? Why not use the native OTLP format?

@jrcamp
Copy link
Contributor Author

jrcamp commented Jun 9, 2020

To summarize, we have two problems here:

  1. We expose internal metrics to a prometheus endpoint that we self-scrape and add data to the pipeline.
  2. The data are sent to the backend using the internal pipeline, which smells bad because we send monitoring data about the pipeline using the same pipeline, so in case of a problem with a pipeline we cannot see monitoring data.

I think this issue tries to fix the first problem initially, then we can fix the second problem. @jrcamp am I correct?

@bogdandrutu though can we address the second point by just using a dedicated pipeline? If the main pipeline is having issues (getting backed up, dropping data, etc.) it shouldn't affect the other pipeline (assuming the user hasn't configured it to).

@nilebox
Copy link
Member

nilebox commented Jun 9, 2020

Is there any benefit in sending these metrics as Prometheus to a separate collector? Why not use the native OTLP format?

One minor potential benefit is that since Prometheus uses a "pull" model, the main collector won't have a dependency on a "separate" collector monitoring it. With a "push" model in OTLP, the main collector has to be configured to forward its self-observability metrics somewhere.

But that's probably not critical.

@flands flands modified the milestones: Beta 0.4, Beta 0.5 Jun 16, 2020
@flands flands modified the milestones: Beta 0.5.0, Beta 0.5.1 Jul 6, 2020
@flands flands modified the milestones: Beta 0.6.0, Beta 0.7.0 Jul 15, 2020
@bogdandrutu bogdandrutu modified the milestones: Beta 0.7.0, Beta 0.8.0, GA 1.0 Jul 30, 2020
@tigrannajaryan
Copy link
Member

Due to lack of time I am removing this for 1.0. If there are any objections please speak.

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
* Bump github.com/google/go-cmp from 0.5.1 to 0.5.2

Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/google/go-cmp/releases)
- [Commits](google/go-cmp@v0.5.1...v0.5.2)

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-fix go.sum changes in dependent modules

* Auto-fix go.sum changes in dependent modules

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <[email protected]>
@bogdandrutu bogdandrutu removed this from the core-release-v37 milestone Jan 18, 2022
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
…ry#1093)

Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.1.1 to 1.3.1.
- [Release notes](https://github.com/hashicorp/vault/releases)
- [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md)
- [Commits](hashicorp/vault@v1.1.1...v1.3.1)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/vault/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@atoulme atoulme changed the title Don't send internal metrics using scraping Support exposing internal metrics over OTLP rather than a Prometheus endpoint Dec 18, 2023
@mx-psi
Copy link
Member

mx-psi commented Dec 21, 2023

@codeboten should we close this in favor of your issue about making the Collector observable?

@mx-psi
Copy link
Member

mx-psi commented Apr 19, 2024

Closing in favor of #7532

@mx-psi mx-psi closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2024
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this issue Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants