-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Summary to the Aggregation in the SDK specifications #2704
Comments
The java SDK data model and its OTLP metric exporters do support summaries. This means that you can implement the MetricData interface with a type of MetricDataType.SUMMARY. And you can pass It's only the metrics API that doesn't support summaries. The API what you would use in place of whatever is recording the individual measurements that are aggregated into your current summaries. It seems unlikely to me that the API would change to have some purpose built facility to accommodate summaries, since histograms are intended to fill that type of use case. |
I believe users should use Histogram instruments configured with Summary aggregators to achieve the desired outcome. Semantically, there is not a difference between the proposed Summary and the current Histogram instrument. We recognize that it is difficult to do what I am suggesting today -- configure a specific Histogram instrument with a specific aggregator -- outside of using Views, which have certain strong requirements to use. See #1753 or #2229. In Lightstep's prototype, I created an API Hint mechanism as a way to configure specific aggregators. I have also created a "minmaxsumcount" alternative aggregator for Histogram instruments, which is a similar use-case to yours: simply a different way to aggregate a distribution. https://github.com/lightstep/otel-launcher-go/tree/main/lightstep/sdk/metric#metric-instrument-hints-api For the record, one of the difficulties in specifying a Summary aggregator is that the Prometheus-equivalent data type is fixed at Cumualative temporality. What would an SDK do if you requested summary for Delta temporality? (If this is a case that doesn't work, fine with me.) |
@jack-berg If the specification group included Summary in the protocol, mainly for legacy support, why not complete the support all the way to the API? The motivation as I stated above is mainly migration. I would like to migrate the entire Apache Pulsar and possible Apache BookKeeper codebase to OpenTelemetry - it's a humongous project, which might be sliced. Since it's a 10-year-old project, of course, it contains Summaries - many of them. The only I can move forward with such migration is to have support for Summary in the Java SDK of OpenTelemetry. I'm pretty sure that those 2 projects are not the only open source projects in Apache that uses Summaries. Having a path for them is a clear win for OT IMO. Your suggestion is what you called out-of-process, but this will reduce the developer experience of committers of Pulsar. I want to have a Summary object that can use, register and it will participate in the exporting of those metrics both to OT Collector and the experimental Prometheus one. If it's possible for me to do, then I probably didn't understand the recipe. @jmacd - so you're saying I can develop a Summary object which will be a Histogram but with a specific Aggregate Handle that stores the data and exposes it as quantiles? I couldn't find the code that reads that data into an exporter, so I can better understand how it happens. Would love more hints in that direction. |
I support letting each OTel SDK do what's natural for users as long as stays within the SDK. The SDK will likely have internal interfaces for representing Aggregators and data-transfer interfaces for conveying data points between the reader and the exporter. The SDK will have a Views mechanism for choosing which Aggregator to use for each instrument (it's settings, etc.). What I propose is:
With these two features combined, your migration path will replace uses of Summary with OTel Histograms having the necessary hints. For a legacy application, I would expect the hint to apply to the entire instrumentation library, so IMO this should take only a single hint to the instrumentation library, then all your Histograms instruments (in that library) will output Summary data points. |
One option that OpenTelemetry could explore after its 1.0 milestone, if there's interest, is to specify a Summary aggregation that closely matches the Prometheus definition that could be derived from the exponential histogram. The OTLP Summary data point will require extensions to support delta temporality. When the potential to use Summary data points with other temporality than the present definition has been discussed in the past, there was not enough interest. |
After thoroughly reading the spec, both API and SDK, and reading part of the implementations of Java SDK and Go SDK, I finally have the missing context I needed to reply :) I would like to clarify the request: I would like the SDK specification, specifically the Aggregation part, to include Summary. Specifically, this part: Coupled with providing hints in the API to specify the aggregation for a histogram, this will allow using a Summary both by an official SDK (specifically the Java one) or by any 3rd party SDK implementing the OTel API. Something of the following form:
@jmacd If SDK would allow free-form aggregators, it probably means I can use that to create a summary aggregator which emits metric data point of type summary. Yet there is a strong downside to this approach - it's not ergonomic - i.e., not user-friendly. Defining a histogram instrument in @jack-berg I agree that Histogram fills that gap, so I correct myself: API should not change, only the SDK Spec Aggregation part needs to change - add Summary as an aggregation, which, together with Hints, will solve it end to end. |
@reyang If what I said above makes sense can this issue be re-opened? |
Here are some of the problems that I think need to be addressed in a summary aggregation:
Its not a trivial amount of work, and I question whether it's worth pursuing. Would it not be easier to switch to histograms (explicit or exponential) and a pattern where quantiles are computed by whatever system the metrics are being sent to (either at read time or write time)? |
@jack-berg, thanks a lot for answering. First, I agree with you it's a challenging effort and involve work to be done. Since summary has existed for a long time, perhaps we can reuse existing work like reading how different libraries in different languages have done it and deciding on one algorithm/implementation for it. Second, I agree that it's questionable since some can just switch to histograms, yet here is why I think it's worth the time:
In my understanding, a Summary is, by definition, a delta temporality. It gives you the quantiles of the last X minutes. It's basically the same if you had Average aggregation or Rate - it doesn't make sense to compute it from when the process starts. Hence, I don't see any value in having a cumulative summary, thus maybe there isn't a need for temporality field in the proto.
All those questions are valid questions that needs answer as part of the effort to introduce it in the specifications. Small note: Apache Pulsar and Apache BookKeeper uses Apache DataSketches libraries which implement a family of streaming algorithms described here - the reason by the way that Prometheus implementation created more memory pressure (GC). In my opinion, it's worth having it, or even keep the issue opened and maybe mark it as open for volunteers? |
What are you trying to achieve?
I would like the API Specifications for Metrics to support the Summary data type
Additional context.
I saw that the Metrics Data Model and the Open Telemetry Protocol support Summar (appears here under Metrics points -> Summary (Legacy).
Then I tried using it in the Java SDK and I couldn't find Summary. I did find an issue that stated for the SDK to support Summary the API Specification must support it first. This is the reason I opened this issue.
I'm working on a big epic in Apache Pulsar which refactors the whole metrics libraries. One library that came up was Open Telemetry. The only "small" problem is that Apache Pulsar uses Summary objects heavily, and can make two really big changes at once, so it made sense that if the Open Telemetry protocol itself supports Summary, why not the API and Java SDK would support it as well, making this migration possible.
The text was updated successfully, but these errors were encountered: