You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We wish to generate key metrics for customer applications, with the following requirements:
Metrics should be generated from Spans produced by OTEL auto-instrumentation.
Metrics are flexible w.r.t. schema and content.
Metrics should be generated from 100% of Spans, regardless of sampling decision.
Span sampling rules should still be respected w.r.t:
Exporting Spans from the Collector, and
Propagating sampling decisions to child spans.
Metrics generation has minimal impact to the application being monitored (e.g. w.r.t CPU/Memory/Network/etc.).
Currently we have considered two solutions for this problem that could be contributed back to the OTEL community, but are open to alternative solutions:
Deferred Sampling
Majority of this solution is described in this issue. The summary is that the sampling decision would be made in the SDK, but all Spans would be sent to the Collector where the decision would be executed.
Once this is done, author a processor module for Collector that produces desired metrics from Spans (akin to, or an augmentation of, SpanMetricsProcessor.
SDK Metrics Span Processing
First, author a new aggregate Sampler in each OTEL SDK that takes the sampling decision made by a provided root sampler (i.e. parentbased_traceidratio) and converts all DROP sampling decisions to RECORD_ONLY sampling decisions.
Then, author a new SpanProcessor in each OTEL SDK that extracts and exports metrics from spans.
What did you expect to see?
A means to produce flexible metrics from 100% of span data regardless of sampling decision, while still supporting arbitrary sampling rules.
This may require an OTEP, but starting with this issue to raise awareness/get feedback on approach.
This was discussed in today's Spec SIG. One comment by @jsuereth stands out: Can we "do this" without Sampling? I see the answer being Yes, but we still run into problems with the Sampler API. As @thpierce notes, option 2 involves a new Sampler.
Here are some related issues:
It is difficult to compose probability samplers that could be used in a span-to-metrics pipeline: #2179
It is difficult to construct spans whose links are not known, which could be solved with a new span state similar to the one discussed in this issue. #2918
My suggestion was actually we could change the behavior of the Tracer such that:
The Sampler does its current job and responds with a sampling decision
If the presence of a "MetricSpanProcessor" exists or an "AllSpanProcessor" or whatever hook we have, the Sampler decision is overriden such that:
Sampled traces remain sampled
RECORD_ONLY traces remain recorded.
dropped traces turn into "METRICS_ONLY" or some other denomination.
We could be more flexible with a level-based approach here, but I don't think we need to impact the existing sampler API, we only need to impact the "types of spans" we need Tracer to interact with.
This sounds interesting. I have been experimenting with the idea of letting traces generate metrics and only send a subset of the traces that otel-collector to the tracing backend. I was thinking like only set a small percentage + always traces with errors to be send to the tracing backend.
What are you trying to achieve?
We wish to generate key metrics for customer applications, with the following requirements:
Currently we have considered two solutions for this problem that could be contributed back to the OTEL community, but are open to alternative solutions:
What did you expect to see?
A means to produce flexible metrics from 100% of span data regardless of sampling decision, while still supporting arbitrary sampling rules.
This may require an OTEP, but starting with this issue to raise awareness/get feedback on approach.
Additional context.
Related Issues:
The text was updated successfully, but these errors were encountered: