-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OTLP Ingestion endpoint #12571
Add OTLP Ingestion endpoint #12571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, need to fix a few small lint issues and I think we're ready to merge this over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this wait 2 days so we first release 2.46? that gives us 6 weeks to polish
Can absolutely wait. No need to rush this :) |
0a36aa0
to
2f04ded
Compare
We copy files from the otel-collector-contrib. See the README in `storage/remote/otlptranslator/README.md`. This supersedes: prometheus#11965 Signed-off-by: gouthamve <[email protected]>
2f04ded
to
c055cd1
Compare
@@ -1,6 +1,7 @@ | |||
/web/ui @juliusv | |||
/web/ui/module @juliusv @nexucis | |||
/storage/remote @csmarchbanks @cstyan @bwplotka @tomwilkie | |||
/storage/remote/otlptranslator @gouthamve @jesusvazquez |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Has the conversation been started within in OTEL about moving the translator code so that we can remove the copy/paste code? |
Yes, see: https://cloud-native.slack.com/archives/C01LSCJBXDZ/p1689161965395529 They'd like to see more involvement from the Prometheus maintainers before they're okay with moving the code. |
I think this is ready to go. I've tested it by pushing metrics from a test application and the otel collector. Found a bug in the OTEL Golang SDK in process too and updated the PR to account for it: open-telemetry/opentelemetry-go#4363 |
It is what the OTEL Golang SDK expect :( open-telemetry/opentelemetry-go#4363 Signed-off-by: Goutham <[email protected]>
6b6a924
to
08a2145
Compare
@@ -380,6 +390,7 @@ func (api *API) Register(r *route.Router) { | |||
r.Get("/status/walreplay", api.serveWALReplayStatus) | |||
r.Post("/read", api.ready(api.remoteRead)) | |||
r.Post("/write", api.ready(api.remoteWrite)) | |||
r.Post("/otlp/v1/metrics", api.ready(api.otlpWrite)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we go for this strange URL?
Is this standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is because typically the OTEL collector expects a single URL for all 3 signals (logs, traces, metrics) and then adds suffixes for each signal i.e v1/metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just added one question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Now to write the documentation for this :) |
@gouthamve and folks - Thanks for this ❤️ Please let me know if I can help in any way writing documentation for this :) |
|
||
This files in the `prometheus/` and `prometheusremotewrite/` are copied from the OpenTelemetry Project[^1]. | ||
|
||
This is done instead of adding a go.mod dependency because OpenTelemetry depends on `prometheus/prometheus` and a cyclic dependency will be created. This is just a temporary solution and the long-term solution is to move the required packages from OpenTelemetry into `prometheus/prometheus`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am the author the normalization of metrics and labels of the Prometheus exporters in the OpenTelemetry Contrib project.
Several comments:
- I hope you guys intend to enable the code that used to be behind the feature gate, to make sure Otel metrics are converted to proper Prometheus metrics that follow the metrics naming conventions (with unit, etc.).
- OpenTelemetry plans to use namespaces everywhere in metric attributes, which will inevitably lead to very long labels in Prometheus. We may want to consider dropping the namespace prefixes in Otel attributes to get "simpler" labels.
- To improve the performance, I wanted to create a sort of cache in the the translation package, in the form of a dictionary associating a triplet (metric name/type/unit) to the corresponding Prometheus metric name. I think that would be a nice addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the original package and the review!
I hope you guys intend to enable the code that used to be behind the feature gate
Already done :) We don't want our users to disable the feature gate, hence its removed, and the functionality is enabled by default.
We may want to consider dropping the namespace prefixes in Otel attributes to get "simpler" labels.
Interesting, can you give an example?
I wanted to create a sort of cache in the the translation package, in the form of a dictionary associating a triplet (metric name/type/unit) to the corresponding Prometheus metric name.
That sounds great. We're using the same package in Mimir and I think we're seeing CPU cycles being spent on the string replaces. Looking it up using a cache sounds good. cc @bboreham @charleskorn
Do you still have plans to implement it upstream? If no, could you open an issue to track it, and we'll try to get someone to work on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otel Technical Committee recently decided to enforce namespace prefixes for all metrics attributes. All metrics semantic conventions from OpenTelemetry are expected to be updated accordingly in the future.
For example, for JVM metrics: the process.runtime.jvm.gc.duration
metric will come with the process.runtime.jvm.gc.name
and process.runtime.jvm.gc.action
attributes. This will end up as process_runtime_jvm_gc_duration_seconds_counter{process_runtime_jvm_gc_name="...", process_runtime_jvm_gc_action="..."}
in PromQL.
We could decide to simplify process_runtime_jvm_gc_name
and process_runtime_jvm_gc_action
labels to name
and action
respectively by removing the matching namespaces in metric name and its labels.
The benefit is that we get metrics and labels that look much like "classic" Prometheus metrics and labels. The problem is that the behavior could be counter-intuitive and difficult to predict in some cases (which labels get simplified or not?).
IMHO, everything would be better if OpenTelemetry didn't enforce namespaces in metrics attributes, but that boat has sailed 😅
Regarding the cache mechanism, I still had hopes to eventually implement it, but who am I kidding, I've been considering this for a year now, and haven't done a thing. I'll open an issue here ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New issue: #12627
Thanks everybody for great work on this! 💪🏽 |
I filed #12623 so that Prometheus uses a more recent version of |
prometheustranslator "github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus" | ||
) | ||
|
||
// addSingleSumNumberDataPoint converts the Gauge metric data point to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect method name in the comment
We copy files from the otel-collector-contrib. See the README in
storage/remote/otlptranslator/README.md
.This supersedes: #11965