-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Donation Proposal]: OTEL Arrow Adapter #1332
Comments
I am beyond excited to see this proposal! Excellent work, and I can't wait to see phase 2 of this project. 🚀 |
I think it might be good to call out some of the large technical differences in OTLP-arrow vs. OTLP and when to use it vs. other options. I was looking over details.
I'm definitely supportive of OTLP-arrow as an viable alternative to OTLP where it makes sense, but I'd love if you could outline where it shines and where it doesn't. Specifically when should you pick one vs. the other. |
Thanks @jsuereth for your support. Regarding your first comment, I added the following note in the OTEP to explain the motivations behind streaming gRPC for OTLP Arrow.
For the second comment, I confirm that the protocol is stateful. We use this property to maintain schemas and dictionaries on the receiver side to amortize their cost. This small extra cost in the first messages is actually paid back quite quickly and allows us to obtain gains on the size of the messages as well as improvements on the compression rate. For your last question, the following two scenarios are particularly suitable for OTLP Arrow (non exhaustive list):
A future enhancement (based on ZSTD dictionaries) should also make this project interesting for scenarios where the batch size is small. |
A couple questions:
|
The design currently calls for the use of gRPC streams to benefit from OTLP-Arrow transport. We believe that some of this benefit can be had even for unary gRPC and HTTP requests with large request batches to amortize sending of dictionary and schema information. This remains an area for study. Regarding the "imbalance over time" question, we rely on the MaxAge and MaxAgeGrace of the gRPC connection to mitigate with this kind of issue. This is discussed in the "Implement recommendations" section of the OTEP. |
I updated my benchmark tool to compare the three following experiments:
The following results are based on traces coming from a production environment (~400K traces). The streaming mode is definitely the best among these 3 experiments. This protocol is stateful and allows to :
It is interesting to note that for batches with more than 100 traces, the unary RPC mode is slightly better than the standard OTLP protocol. This is due to the organization of the data being better suited to compression. |
@lquerel thanks for the update! What does the batch size reflect? Is it literally the number of traces or number of spans? If number of traces, how many spans in each trace? |
@lquerel I am surprised streaming saves so much for large batch sizes. I would expect that for sufficiently large batches the dictionary is a small part of it so re-sending it with every batch shouldn't result in much bigger payloads. |
Also, if I understand correctly the columnar format shows little advantage in compressed form (and sometimes is even worse, for very small batches). So, essentially most of the gains are from streaming not from using columnar format. |
@tigrannajaryan The following pull request contains the modifications to benchmark the "unary RPC" mode for traces and logs. I'm still working on finalizing a memory optimisation for metrics. These benchmarks reproduces all the steps from the conversion between OTLP and OTLP Arrow, the serialization, the compression, ... These steps are cleary represented in the output of the benchmark tool. To run the benchmark you need to get some data in protobuf format. Basically a Trace Request serialized in binary format and saved as a file. To run the benchmark for traces I'm using the following command.
I will soon update the benchmark tool to produce more detailed stats in order to present the distribution of spans and traces in the input file. |
Can you share the |
Regarding the impact of streaming on the compression ratio. Not having to send dictionaries over and over again for many fields is definitely an advantage. By nature, a dictionary doesn't compress well because there aren't many repetitions, but when you can get rid of dictionaries entirely, it's directly a net gain. This is what happens in the scenario using streaming. However, the gain may vary a bit depending on the nature of the data set. In the case of the unary rpc scenario, we pay both the overhead of the schema and the dictionaries that are rebuilt for each batch and that in comparison with a good compression algorithm do not do much better in unary rpc mode. However in terms of allocation, serialization and processing speed (e.g. for filtering, aggregation), the columnar format will be incomparably better (visible in phase 2). It is theoretically possible to do better I think by sorting some columns to optimize the compression rate. This is a well known optimization in the world of columnar data. However it is quite difficult to implement in a general way in the OTel context because the OTel entities are strongly hierarchical and the attributes are not uniform between the different entities which compose the batches. This will be the subject of a future optimization that will not impact the protocol and the data format. Finally, the fact that the compression ratio is a bit worse with small batches was expected. The goal of this effort is to optimize batch sizes of at least 100-200 elements. |
For the record:
We did a study of the impact of unary vs. streaming gRPC. There is nothing preventing us from using unary RPC, but you will need large batches to get good compression and you will still be leaving compression "on the table".
It is point-to-point stateful. You can reset a connection at any time and it will rebuild the state needed in a future connection. |
I'm not very familiar with this project, and so might be rehashing something that has already been discussed, but thought I would highlight that Parquet does not support UnionArrays, something the current schema appears to use extensively. There was a proposal to add it a while back, apache/parquet-format#44, but it has stalled out. Given adoption of parquet v2, released almost 10 years ago, is still ongoing, I wouldn't hold out much hope for broad ecosystem support even if this were to be picked up again. In general support for UnionArray within the arrow ecosystem is fairly inconsistent, and I would strongly advise against using them if ecosystem interoperability, especially with parquet-based tooling, is a key design principle. Otherwise very excited to see movement in this space, keep up the good work 👍 |
Regarding protocol, has Arrow Flight RPC been considered? It's essentially a gRPC service built to move Arrow Record Batches over the network.
|
@tustvold Thank you for your point about UnionArrays. We absolutely want to avoid taking steps that limit the future use of Parquet. While we were aiming for network compression on a bridge between two OTel collectors, it is natural that users want to record files of telemetry data for other uses. |
@jacobmarble Our implementation uses Arrow IPC under the cover of a gRPC stream encapsulating Arrow IPC messages. We took this approach (a) to fit with the OTel collector which provides a rich set of helpers for setting up and monitoring gRPC connections, and (b) to make it easy to upgrade by sharing the gRPC server instance used by an OTLP receiver, so that one server can support both protocols on the same port. |
@open-telemetry/technical-committee please consider whether we are willing to accept this donation based on the technical diligence carried out by @tigrannajaryan in open-telemetry/oteps#171 (comment). Before focusing on current, unresolved feedback, I want to step back and remind us of the goal here. The goal is to have an OTel-Arrow project supporting generation and exchange of column-oriented telemetry data using OpenTelemetry's data model. When @lquerel presented his first proposal on this, members of @open-telemetry/technical-committee including myself and @tigrannajaryan requested the project be split into at least two phases. The first phase was to bring a high-compression bridge feature to the OTel collector and pave the way for further development of OTel-Arrow. The first phase is essentially complete at this point. @tigrannajaryan's results indicate that slightly more compression ought to be achievable by comparing against a bespoke, row-based encoding. He his correct, and we take the potential for a slight compression improvement as a bug in our implementation. @lquerel is currently working on an enhancement to overcome this problem. While @tigrannajaryan's results show there is an easier way to achieve a high-compression bridge between OTel collectors, a bespoke encoding does not bring us closer to supporting column-oriented telemetry data in the Arrow ecosystem. I see the result ultimately as confirming that our implementation, built on the Arrow-Go library, has good performance. Moreover, this sort of bespoke, row-oriented encoding will not be able achieve the significant gains in processing speed that Arrow and its columnar representation will offer in Phase 2. Accepting this donation means agreeing to form a new OTel group focused on the kind of technical and user feedback we are now receiving. Here are some items that we regard as future work blocked until this donation is accepted:
We should not stall this donation because there is room for technical improvement in our Phase 1 product. Further, I suggest we should approve and merge @lquerel's OTEP as it captures our Phase 1 results. The kinds of discussion points raised by @tigrannajaryan should be discussed in the future when we move to add protocol support to core opentelemetry-proto repository, not at this stage. On this last point, @tigrannajaryan's feedback was well received. I agree we should not require the use of Arrow to achieve reasonably good compression in OpenTelemetry. I support the notion that OTLP's existing, row-oriented protocol would benefit from a dictionary mechanism. Since @tigrannajaryan demonstrated that a bespoke encoding can easily achieve very good results, it occurred to us that we should try to separate our work on bringing a streaming bridge to the OTel collector from our work on the OTel-Arrow adapter. The observation is that the changes made to the OTLP exporter and receiver components in https://github.com/open-telemetry/experimental-arrow-collector for streaming are decoupled from the OTel-Arrow adapter itself; with some work, the OTLP exporter and receiver components could support pluggable stream encodings. For instance, we could engineer the core OTLP exporter and receiver components to support both the OTel-Arrow mechanism (based on Arrow IPC) or the bespoke mechanism designed by @tigrannajaryan; this also appears to be a way we could build support for versioning in the OTel-Arrow protocol. |
The TC looked at the donation proposal and the associated OTEP and discussed it with Collector maintainers. Here are TC's observations:
Based on the above the TC makes the following recommendations:
Independently from the above as it was discovered during the OTEP review, OpenTelemetry should consider adding a dictionary encoding extension to OTLP to gain significant wire size benefits. |
@lquerel if you can adjust your OTEP to match the conditions listed in the TC response we should be able to also merge the OTEP. |
@tigrannajaryan yes I plan to update this OTEP soon to reflect my last changes and to integrate the conditions listed in the TC response. Thanks. |
@tigrannajaryan We believe the OTEP has been updated as you requested. Are we clear to merge this donation and the OTEP? |
The GC voted to accept the donation (#1332) on July 6 2023. |
I've requested a formal statement from F5 and Lightstep's sales and marketing departments as detailed in the community donation process, after which I will close this issue. Thanks all! |
ServiceNow formerly known as Lightstep acknowledges the marketing guidelines (https://github.com/open-telemetry/community/blob/main/marketing-guidelines.md) |
F5 acknowledges the marketing guidelines (https://github.com/open-telemetry/community/blob/main/marketing-guidelines.md) |
The donation is complete. Thank you especially to @lquerel and F5. |
Description
This project extends, in a compatible way, the existing OTLP protocol with a generic columnar representation for metrics, logs, and traces based on the cross-language Apache Arrow transport system for columnar data. This extension significantly improves the efficiency of the OTLP protocol for scenarios involving the transmission of large batches of OTLP entities. Results show a 2-3 times better compression rate for typical data, while Apache Arrow’s zero-copy serialization and deserialization techniques help lower overhead.
The first phase of this project will deliver the following primary components, contained in the donated repository:
In a second phase of the project, we propose developing new OpenTelemetry components and mechanisms for exchanging and processing data with the Apache Arrow ecosystem, including:
The repository for donation: https://github.com/f5/otel-arrow-adapter
Collector repo fork containing the drop-in compatible exporter and receiver components under development: https://github.com/open-telemetry/experimental-arrow-collector
More details on the associated OTEP text: https://github.com/lquerel/oteps/blob/main/text/0156-columnar-encoding.md. The OTEP is still pending, unmerged.
Benefits to the OpenTelemetry community
Compared to the existing OpenTelemetry protocol this compatible extension has the following improvements:
Reasons for donation
The proposed protocol is tightly integrated with the existing OTLP protocol. A fallback mechanism between the two protocols has been implemented via a new pair of OTLP Arrow Exporter/Receiver that are drop-in compatible with the OTLP Exporter/Receiver core component, justifying the integration of this extension directly into the upstream project.
Repository
https://github.com/f5/otel-arrow-adapter
Existing usage
The project is developed and maintained jointly by F5 and Lightstep. F5 seeks to align the open-source community built around NGINX with the open-source community built around OpenTelemetry; this effort will deliver industrial-grade analytics capability to the telemetry data produced by its software components for its customers. Improving the compression performance and representation of multivariate time series are also among F5's goals. Lightstep, wanting to recommend OpenTelemetry collectors for its customers to use, seeks to improve the compression performance and efficiency achieved by OTLP when used for bulk data transport.
We are actively testing the components for donation on production data using an OpenTelemetry collector built from our drop-in compatible OTLP exporter and receiver. We are extending test coverage and eliminating gaps. We are planning to start beta tests with the community by providing documentation and tooling for benchmarking and troubleshooting.
Maintenance
F5 and Lightstep will continue to develop and maintain the project. We will encourage and help all new contributors to participate in this project. We are open to suggestions and ideas.
Our current roadmap is as follows:
a) New client SDK natively supporting OTLP Arrow and multivariate metrics.
b) Continuing the migration to Apache Arrow to achieve end-to-end performance gains.
c) Integrating processing, aggregation, and filtering capabilities that leverages the Apache Arrow eco-system.
d) Parquet integration.
Licenses
This project is licensed under the terms of the Apache 2.0 open source license.
Trademarks
The Apache Arrow project is used in this project.
Other notes
No response
The text was updated successfully, but these errors were encountered: