Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for OTEL_TRACES_SAMPLER #37301

Open
CaerusKaru opened this issue Sep 11, 2024 · 10 comments
Open

Support for OTEL_TRACES_SAMPLER #37301

CaerusKaru opened this issue Sep 11, 2024 · 10 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. Monitor - Distro Monitor OpenTelemetry Distro needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@CaerusKaru
Copy link

Is your feature request related to a problem? Please describe.

The OpenTelemetry Python library has support for many default sampling schemes. Some of these allow for configuring sampling not based on a probabilistic rate, but instead on the context of the parent span. In an instance where most tracing should be ignored, except parent spans, this is incredibly useful to reduce noise. Our explicit example is an application instrumented for long polling, where we only want to capture data for inbound requests, where the trace context is retrieved from the queue. We don't want 99.99% of all traffic sampled, but we do want the inbound requests to be captured, assuming the context is properly set on those spans.

Describe the solution you'd like

Right now, this library assumes a probabilistic sampling, with explicit support for OTEL_TRACES_SAMPLER_ARG. This should be extended to also support OTEL_TRACES_SAMPLER, where the underlying logic can be shared in terms of setting up the context on the samples, but the decision on when/how to sample should be abstracted to the underlying library, allowing support of multiple sampler types.

Describe alternatives you've considered

The only other way to achieve this is to literally rip out and replace the trace sampler from the library, pretty much doing what I said above, just without explicit support.

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Sep 11, 2024
@kristapratico kristapratico added feature-request This issue requires a new behavior in the product in order be resolved. Service Attention Workflow: This issue is responsible by Azure service team. Client This issue points to a problem in the data-plane of the library. Monitor - Distro Monitor OpenTelemetry Distro and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Sep 11, 2024
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Sep 11, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jeremydvoss @lzchen.

@kristapratico
Copy link
Member

@CaerusKaru thanks for your feature request, the team will take a look and respond as soon as possible.

@CaerusKaru
Copy link
Author

A workaround for this is to reset the sampler for the TracerProvider that Azure creates. The Azure one is simply an override of the probabilistic algorithm to replace OpenTelemetry's. If we're not intending to use that anyway, we can just revert to the default behavior:

from opentelemetry import trace
from opentelemetry.sdk.trace import sampling

trace.get_tracer_provider.sampler = sampling._get_from_env_or_default()

@lzchen
Copy link
Member

lzchen commented Sep 12, 2024

@CaerusKaru

You are using azure-monitor-opentelemetry package I am assuming to instrument?

@CaerusKaru
Copy link
Author

Yes, that's correct.

@lzchen
Copy link
Member

lzchen commented Sep 12, 2024

@CaerusKaru

The reason we don't support configuration of the sampler used in the azure-monitor-opentelemetry package is because Application Insights backend assumes a specific custom sampling algorithm to be used to power some user experiences (autogenerating metrics from itemcount derived from sampling). The sampler used can be found here. Any deviation from this sampler could possibly result in unexpected performance on the appinsights UX. Now, if you are adamant about using your own sampling algorithm, we would deem this as an advanced telemetry scenario that is unsupported, and you can use the more configurable azure-monitor-opentelemetry-exporter package in which you would have access to each component to allow for configuration. You can see documentation on how to onboard here.

@CaerusKaru
Copy link
Author

OpenTelemetry has three categories of sampling, but they all boil down to the same thing: on, off, and maybe. In the specification, there are further subdivisions, for ParentBased. In the sampler you linked above (which I am now very familiar with after hammering at this over several days), this is acknowledged: the sampler exits early if the sampler arg is 0.0 (off) or 1.0 (on). My request here is to extend that conditional to be dependent on the context of the parent, which follows the specification.

I don't see how this would impact at all the performance in the UX, as the probabilistic model is unaffected. To the UX, it's still either getting everything, nothing, or following the algorithm for sampling laid out already.

@lzchen
Copy link
Member

lzchen commented Sep 12, 2024

@CaerusKaru

That's totally fine. We would probably not include this as a configuration option for the distro since is more suited for out-of-the-box, simple configuration and the majority of users are expected to use the given built-in sampler. This is more of a product issue rather than a technical one. We would probably not include this as a configuration option for the distro since is more suited for out-of-the-box, simple configuration and the majority of users are expected to use the given built-in sampler. You are more than welcome to implement your own sampler and use piece-wise with the exporter. We would advise inheriting from the ApplicationInsightsSampler to obtain all the specific implementation details so the said experiences won't be affected (notice the special SAMPLE_RATE_KEY as well).

@CaerusKaru
Copy link
Author

That doesn't feel like it's in the spirit of the library, though. The library explicitly supports OTEL_TRACES_SAMPLER_ARG, so why should it not also support OTEL_TRACES_SAMPLER? Especially when the implementation would be relatively trivial? If it's about manpower, I'd happily offer up a PR if I knew it was likely to be accepted.

@CaerusKaru
Copy link
Author

Edit: either the above, or some update to the documentation explicitly calling out that OTEL_TRACES_SAMPLER is ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. Monitor - Distro Monitor OpenTelemetry Distro needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

3 participants