Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093

jrcamp · 2020-06-05T16:55:59Z

Is your feature request related to a problem? Please describe.
Internal OT metrics are currently exposed through Prometheus endpoint which must then be scraped. This is more inefficient than if the metrics were sent directly through the pipeline. In addition, context is lost since OT has additional concepts like resource labels vs metric labels. The metric names are also transformed by being sent through Prometheus instead of directly through the pipeline.

This also results in additional complexity in the configuration since a Prometheus scraper must be configured.

Describe the solution you'd like
Be able to send internal metrics through the pipeline in OTLP format.

jrcamp · 2020-06-05T16:56:56Z

@bogdandrutu @tigrannajaryan Would like to put this in a GA milestone since changing this in the future will break metric names being exported.

bogdandrutu · 2020-06-05T18:44:51Z

@jrcamp I think we should send metrics directly to the backend not through the pipeline, because if there is a problem in the pipeline these metrics do not get to the backend.

bogdandrutu · 2020-06-05T18:52:33Z

I think the current approach is a "big" hack added long time ago. OpenCensus (which is currently used) can export directly to different backends and don't need this hack to export prometheus then scrape.

jrcamp · 2020-06-05T20:41:59Z

@ccaraman I think this falls under self-observability, do you want to take it?

bogdandrutu · 2020-06-06T18:29:26Z

To summarize, we have two problems here:

We expose internal metrics to a prometheus endpoint that we self-scrape and add data to the pipeline.
The data are sent to the backend using the internal pipeline, which smells bad because we send monitoring data about the pipeline using the same pipeline, so in case of a problem with a pipeline we cannot see monitoring data.

I think this issue tries to fix the first problem initially, then we can fix the second problem. @jrcamp am I correct?

nilebox · 2020-06-09T00:19:28Z

we should send metrics directly to the backend not through the pipeline

Note that the backend here may also differ from the "normal" one, e.g. could be using completely different exporter, or same exporter with a different config.

Also, the current Prometheus endpoint actually allows running a separate Collector instance for scraping and forwarding "self-observability" metrics, i.e. making isolation possible. The problem of course is that now you also need to monitor that "separate Collector".

Relevant issue for adding latency metrics: #542, which is blocked by the current approach of reusing the internal pipeline for self-observability.

jrcamp · 2020-06-09T18:09:30Z

Also, the current Prometheus endpoint actually allows running a separate Collector instance for scraping and forwarding "self-observability" metrics, i.e. making isolation possible. The problem of course is that now you also need to monitor that "separate Collector".

Is there any benefit in sending these metrics as Prometheus to a separate collector? Why not use the native OTLP format?

jrcamp · 2020-06-09T18:13:01Z

To summarize, we have two problems here:

We expose internal metrics to a prometheus endpoint that we self-scrape and add data to the pipeline.

The data are sent to the backend using the internal pipeline, which smells bad because we send monitoring data about the pipeline using the same pipeline, so in case of a problem with a pipeline we cannot see monitoring data.

I think this issue tries to fix the first problem initially, then we can fix the second problem. @jrcamp am I correct?

@bogdandrutu though can we address the second point by just using a dedicated pipeline? If the main pipeline is having issues (getting backed up, dropping data, etc.) it shouldn't affect the other pipeline (assuming the user hasn't configured it to).

nilebox · 2020-06-09T23:12:19Z

Is there any benefit in sending these metrics as Prometheus to a separate collector? Why not use the native OTLP format?

One minor potential benefit is that since Prometheus uses a "pull" model, the main collector won't have a dependency on a "separate" collector monitoring it. With a "push" model in OTLP, the main collector has to be configured to forward its self-observability metrics somewhere.

But that's probably not critical.

tigrannajaryan · 2020-10-19T16:23:46Z

Due to lack of time I am removing this for 1.0. If there are any objections please speak.

* Bump github.com/google/go-cmp from 0.5.1 to 0.5.2 Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.1 to 0.5.2. - [Release notes](https://github.com/google/go-cmp/releases) - [Commits](google/go-cmp@v0.5.1...v0.5.2) Signed-off-by: dependabot[bot] <[email protected]> * Auto-fix go.sum changes in dependent modules * Auto-fix go.sum changes in dependent modules Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Tyler Yahn <[email protected]>

…ry#1093) Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.1.1 to 1.3.1. - [Release notes](https://github.com/hashicorp/vault/releases) - [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md) - [Commits](hashicorp/vault@v1.1.1...v1.3.1) --- updated-dependencies: - dependency-name: github.com/hashicorp/vault/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

mx-psi · 2023-12-21T11:58:20Z

@codeboten should we close this in favor of your issue about making the Collector observable?

mx-psi · 2024-04-19T14:56:56Z

Closing in favor of #7532

jrcamp added the feature request label Jun 5, 2020

bogdandrutu added this to the Beta 0.4 milestone Jun 5, 2020

jrcamp changed the title ~~Send internal metrics through the pipeline instead of scraping~~ Don't send internal metrics using scraping Jun 5, 2020

flands modified the milestones: Beta 0.4, Beta 0.5 Jun 16, 2020

flands modified the milestones: Beta 0.5.0, Beta 0.5.1 Jul 6, 2020

flands modified the milestones: Beta 0.6.0, Beta 0.7.0 Jul 15, 2020

bogdandrutu modified the milestones: Beta 0.7.0, Beta 0.8.0, GA 1.0 Jul 30, 2020

tigrannajaryan modified the milestones: GA 1.0, Backlog Oct 19, 2020

andrewhsu added enhancement New feature or request and removed feature request labels Jan 6, 2021

jrcamp mentioned this issue Jun 8, 2021

Support for a generic metrics exporter for the collector's own telemetry #3371

Closed

jrcamp mentioned this issue Jun 24, 2021

Identify and move components from core to contrib to support trace stability milestone #3474

Closed

alolita added the area:prometheus label Jun 30, 2021

alolita modified the milestones: Backlog, Phase2-Metrics-GA-Roadmap Sep 24, 2021

bogdandrutu modified the milestones: Phase2-Metrics-GA-Roadmap, core-release-v37 Sep 30, 2021

alolita removed the area:prometheus label Oct 29, 2021

bogdandrutu removed this from the core-release-v37 milestone Jan 18, 2022

atoulme changed the title ~~Don't send internal metrics using scraping~~ Support exposing internal metrics over OTLP rather than a Prometheus endpoint Dec 18, 2023

mx-psi closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2024

swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this issue Oct 9, 2024

support otel-operator install as subchart (open-telemetry#1093)

fb57137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093

Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093

jrcamp commented Jun 5, 2020 •

edited

Loading

jrcamp commented Jun 5, 2020

bogdandrutu commented Jun 5, 2020

bogdandrutu commented Jun 5, 2020

jrcamp commented Jun 5, 2020

bogdandrutu commented Jun 6, 2020

nilebox commented Jun 9, 2020

jrcamp commented Jun 9, 2020

jrcamp commented Jun 9, 2020

nilebox commented Jun 9, 2020

tigrannajaryan commented Oct 19, 2020

mx-psi commented Dec 21, 2023

mx-psi commented Apr 19, 2024

Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093

Support exposing internal metrics over OTLP rather than a Prometheus endpoint #1093

Comments

jrcamp commented Jun 5, 2020 • edited Loading

jrcamp commented Jun 5, 2020

bogdandrutu commented Jun 5, 2020

bogdandrutu commented Jun 5, 2020

jrcamp commented Jun 5, 2020

bogdandrutu commented Jun 6, 2020

nilebox commented Jun 9, 2020

jrcamp commented Jun 9, 2020

jrcamp commented Jun 9, 2020

nilebox commented Jun 9, 2020

tigrannajaryan commented Oct 19, 2020

mx-psi commented Dec 21, 2023

mx-psi commented Apr 19, 2024

jrcamp commented Jun 5, 2020 •

edited

Loading