-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/otlpreceiver] Support Rate Limiting #6725
Comments
Can you clarify this is not only for OTLP but would be applicable to any pipeline with an exporter able to return this error? |
Yeah that's a good point. The collector could have some general |
#9357 does not fully implement the OTel specification about OTLP HTTP throttling: there does not exist a way to set the |
@TylerHelmuth can you take a look? |
The
The Reading through the issue again I agree that we could introduce more to the collector to allow components to explicitly define how they want clients to retry in known, controlled scenarios. For that use case, this issue is not completed yet. |
@TylerHelmuth Thank you for your response. To support your analysis with personal experience: I have a custom processor that limits the number of unique trace IDs per service per minute. In this case, it is possible to determine the appropriate duration after which it should be permissible that the service resubmit their trace data. Allowing an OTLP client to use exponential backoff is sufficient but not optimal. Optimal being a solution that, within system limits/limitations, reduces the duration between when an operation of a service creates a span or trace and when that span or trace is available to be queried from a backend storage system, and reduces the amount of resources (cpu, memory, network, i/o) required to report the span or trace from the originating service to a backend storage system. However, in most cases, the benefit from this optimization will be small if not negligible. |
@blakeroberts-wk Can you opensource this custom processor that limits the number of unique trace IDs per service per minute. |
Is your feature request related to a problem? Please describe.
The OpenTelemetry Specification outlines throttling for both gRPC and HTTP; however, the OTLP receiver does not currently support this (optional) specification.
Right now, if a processor is under pressure the only option it has is to return an error informing the receiver to tell the client that the request failed and is not retry-able.
Describe the solution you'd like
It would be neat if the receiver offered an implementation of
error
that could be returned to it to signal that it should return an appropriately formatted response to the client signaling that the request was rate limited. The format of the response should follow the semantic convention (e.g. the HTTP receiver should return a status code of429
and set the "Retry-After" header.For example, the OTLP receiver could export the following
error
implementation:Any processor or exporter in the pipeline could return (optionally wrapping) this error:
Then when handling errors from the pipeline, the receiver could check for this error:
Describe alternatives you've considered
To accomplish rate limiting, a fork of the OTLP receiver will be used. Here are the changes: main...blakeroberts-wk:opentelemetry-collector:otlpreceiver-rate-limiting.
Additional context
The above example changes include the addition of an internal histogram metric which records server latency (
http.server.duration
orrpc.server.duration
) to allow monitoring of the collector's latency, throughput, and error rate. This portion of the changes is not necessary to support rate limiting.There exists an open issue regarding rate limiting (#3509); however, the suggested approach seems to suggest the use of Redis which goes beyond what I believe necessary for the OTLP receiver to support rate limiting.
The text was updated successfully, but these errors were encountered: