Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how OTLP data can be stored in a file #1443

Closed
tigrannajaryan opened this issue Feb 16, 2021 · 21 comments · Fixed by #2235
Closed

Define how OTLP data can be stored in a file #1443

tigrannajaryan opened this issue Feb 16, 2021 · 21 comments · Fixed by #2235
Labels
help wanted Extra attention is needed spec:protocol Related to the specification/protocol directory

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Feb 16, 2021

OTLP currently defines how telemetry can be serialized and transferred over the network.

There are use cases that require writing telemetry to a file and then reading it back, e.g. storing test data or ability to transfer telemetry from DMZ that only allows file transfers: https://github.com/open-telemetry/opentelemetry-collector/issues/2450

We need to define how OTLP data can be serialized and stored in a file. We should allow to store multiple batches of telemetry, then read it back. All of the telemetry signals should be supported. The file format should be optimized for append scenarios and for sequential reading scenario.

The stored telemetry should contain metadata that describe what signal type it is. It should be possible to store telemetry of mixed signals in one file unambiguously.

Optional requirements:

  • Ability for random access of records.
  • Compression of stored records.
  • Support for both binary (e.g. Protobuf) and human-readable formats (e.g. JSON).

Note that experimental file exporter exists in the Collector: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/fileexporter that outputs records in OTLP/JSON format, however it lacks the metadata that tells the signal type and thus cannot be used in mixed signal scenarios.

@tigrannajaryan tigrannajaryan added the spec:protocol Related to the specification/protocol directory label Feb 16, 2021
@hdost
Copy link

hdost commented Feb 16, 2021

This seems like it should be moved to open-telemetry/opentelemetry-proto. As it's literally a data model question.

@bogdandrutu
Copy link
Member

@tigrannajaryan let's start with the support for binary/protobuf since that is marked as stable, then we can add/fix the support for JSON when that will be marked as stable.

Protobuf defines some of these things here:
https://developers.google.com/protocol-buffers/docs/techniques?csw=1#streaming

It provides a helpers as writeDelimitedTo:
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite.html#writeDelimitedTo-java.io.OutputStream-

@tigrannajaryan
Copy link
Member Author

This seems like it should be moved to open-telemetry/opentelemetry-proto. As it's literally a data model question.

Historically this repo contains the OTLP specification, although the data structures are defined in the proto repo.

@tigrannajaryan
Copy link
Member Author

let's start with the support for binary/protobuf since that is marked as stable, then we can add/fix the support for JSON when that will be marked as stable.

I agree.

Protobuf defines some of these things here:
https://developers.google.com/protocol-buffers/docs/techniques?csw=1#streaming

Protobuf's recommendation is only about the size / delimiting. I think it would be very nice to have a way to store mixed signals in one file, which means either we also have to store the type of the message or define a new message type which is oneof traces, metrics or logs.

Either way I think it will be very useful to spend a bit time to think through this properly and define a format that can evolve over time even if it only handles basic use cases for now. We can have a simple per-file header (e.g. with a signature and a version, possibly also compression method), and a per-message header (with message size, maybe also message type), both with some reserved bits for future expansion. Alternatively we re-use some existing container format instead of inventing our own.

I would also like to understand if we make this also work as a framing protocol over arbitrary continuous byte streams that will be an added bonus (we will get OTLP/TCP for free :-) ). Probably not, since the constraints are different, but it is still a useful thought exercise.

@bogdandrutu
Copy link
Member

@tigrannajaryan I think we can start by agreeing on the goals first.

  • Do we need support for multiple telemetry signals in the same file? Personally I would say no, since we don't offer this in any of the other places, gRPC, Kafka, etc. Why is this a special case than other places where we don't? If we do here should we do it everywhere?
  • Support for both binary (e.g. Protobuf) and human-readable formats (e.g. JSON). Is this in the same file?
  • Ability for random access of records. Clarify on this, do we need to build an index? Or just the ability to read the size as protobuf recommends now is good enough?
  • Compression of stored records. Do we need to encode this in the file's metadata? What are the properties for a compressed file? Do we need random access for compress files?

@tigrannajaryan
Copy link
Member Author

@bogdandrutu Good questions. Some thoughts below.

Do we need support for multiple telemetry signals in the same file? Personally I would say no, since we don't offer this in any of the other places, gRPC, Kafka, etc. Why is this a special case than other places where we don't? If we do here should we do it everywhere?

We do support it in OTLP/gRPC. OTLP network protocol includes the mechanism that allows to distinguish the requests for signals. It is part of the protocol. We don't tell the user: "you need to figure out yourself how to send traces and metrics so that they don't end up hitting the same port on the receiver and receiver will not understand it because it does not know what you sent".

We rely on the type-based demultiplexing feature of gRPC. We also have clearly defined URL-based demultiplexing in OTLP/HTTP.

I think we need a similar thing for files otherwise we are telling the user "it is on you to decide how to differentiate files that contain different signal types and if you accidentally mix them when reading it is going to result in decoding errors".

If we don't have this builtin in the format then the user has to make this decisions (e.g. use filename as the indicator of the signal type) and it is going to be fragile and error prone.

You are right that we don't support it in Kafka exporter and I think that is also a mistake.

IMO, this is important to have to make sure your data can be read without additional information coming from elsewhere (the knowledge that this file contains traces).

Support for both binary (e.g. Protobuf) and human-readable formats (e.g. JSON). Is this in the same file?

Not a strong opinion but it can be probably be different files (different formats altogether).

Ability for random access of records. Clarify on this, do we need to build an index? Or just the ability to read the size as protobuf recommends now is good enough?

I think random access is not necessary, but I wanted to list it to open a discussion. I cannot think of a use-case where it is important. If someone truly needed this they can store OTLP Protobufs in sqlite file database as individual records.

Compression of stored records. Do we need to encode this in the file's metadata? What are the properties for a compressed file? Do we need random access for compress files?

It is either a compression method per file or per record. If it is per file then it goes to file header otherwise it has to be repeated per record. However, I do not see why it is needed per record. It's not like we need different compression methods per record.

@anuraaga
Copy link
Contributor

anuraaga commented Feb 17, 2021

I think we need to define what a record is (which could apply to message exporters like Kafka too). Is a record a span, a trace, or just arbitrary resource-spans? If a span or trace, span could potentially be resource-spans, but it seems a bit shoehorned to only have a single span by convention. For a trace, I guess resource-spans doesn't work anyways.

In practice, I think for storage, either span or trace is more useful than arbitrary resource-spans.

@tigrannajaryan
Copy link
Member Author

For the use cases listed above it is beneficial to be able to store full ResourceSpans, ResourceMetrics, ResourceLogs messages defined in the proto.

There may be use cases when you may want to store just individual spans, etc, but I think that needs to be considered separately. Also nothing prevents from storing a single Span in ResourceSpans, it is not terribly inefficient even in that case, the overhead will be small. You can also store a trace in ResouceSpans, we have no way of knowing whether it is a full trace or just a collection of spans unless we start looking into trace ids, which I think is out of the scope for this issue.

@kumoroku
Copy link

maybe a naive question, but how would this relate to buffering to disk when exporters can't talk to their destinations? completely orthogonal, or would we use the same format? @pmm-sumo

@jsuereth
Copy link
Contributor

I agree w/ @bogdandrutu that we might want to identify use cases for supproted file formats.

As @kumoroku points out, buffering to disk would be important and POSSIBLY requires random-access / indexing by time. It gets fuzzy if you're bundling similar to OTLP when/what data you drop if you need to treat everything as a bundle of Resource*s.

@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Feb 17, 2021

We had a discussion today in the Log SIG.

The buffering use case is different and yes, @jsuereth you are right has different requirements. At the minimum it requires a way to delete already consumed data (since it is a queue). In the past I used sqlite as a container for on disk buffers in a log collector and it worked very well, allowed to choose performance vs durability (since fsync is controllable). However, sqlite is likely an overkill for simple cases when you want to store OTLP data in a file.

I am not sure we can serve all these different use cases with a single solution. The on disk buffer also likely has more relaxed format stability guarantees (it is more acceptable to loose a disk buffer when upgrading a Collector version, than to render a previously written OTLP file unreadable).

So, there is possibly a need for several different solutions targeted at different use cases.

@rakyll
Copy link
Contributor

rakyll commented Feb 22, 2021

What about setting string encoding rules initially? Storage requirements are use-case specific and will very potentially need custom exporters. But converting to JSON, it's not always clear how binary fields should be serialized and currently it's up to the exporters. Would it make sense to begin this work by setting the JSON encoding rules and making them a part of the spec?

@tigrannajaryan
Copy link
Member Author

@rakyll there is an "Alpha" spec for OTLP/JSON, see https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlphttp-request

JSON-encoded Protobuf payloads use proto3 standard defined JSON Mapping for mapping between Protobuf and JSON, with one deviation from that mapping: the trace_id and span_id byte arrays are represented as case-insensitive hex-encoded strings, they are not base64-encoded like it is defined in the standard JSON Mapping. The hex encoding is used for trace_id and span_id fields in all OTLP Protobuf messages, e.g. the Span, Link, LogRecord, etc. messages.

We also already have implementations of this format in the Collector's OTLP/HTTP receiver and exporter.

@tigrannajaryan
Copy link
Member Author

The JSON format depends on open-telemetry/opentelemetry-proto#268

@sirianni
Copy link
Contributor

Also nothing prevents from storing a single Span in ResourceSpans, it is not terribly inefficient even in that case, the overhead will be small.

FWIW we are doing this internally at Confluent. Many sink connectors in the Kafka Connect ecosystem require a 1-1 relationship between a Kafka message and a row in the data store (BigQuery, Elasticsearch, etc.). By having a single metric/span in the record, this allows us to use those connectors to write OTel data to those data stores. We've implemented this by writing a OTel Collector processor that splits incoming OTLP protos into one-per metric/span. Happy to contribute this upstream if there is interest.

@alexvanboxel
Copy link

A solution for a file format would be to use the CloudEvents spec. It lists a lot of mapping for different formats (JSON, Avro, Protobuf) and different bindings (Kafka, MQTT, Pubsub, ...) of events.

I see 4 different combinations for files (from most compact to most verbose):

It may be counter-intuitive to embed a Protobuf file in an Avro file but it fits nicely into the CloudEvent concept, as the format is only a container (it can embed anything). The spec only mandates id, source, specversion, and type. For the OTLP spec we could add another required attribute datacontenttype.

type could be for example: org.opentelemetry.otlp.traces.v1, org.opentelemetry.otlp.metrics.v1, and org.opentelemetry.otlp.logs.v1.

@atoulme
Copy link
Contributor

atoulme commented Dec 21, 2021

Taking from this conversation and using a very restrictive filter to pick up the minimum of requirements, here is a PR with the serialization format I came up with: #2235

@atoulme
Copy link
Contributor

atoulme commented Dec 21, 2021

Unaddressed requirements: buffering to disk, and most performance improvements around making this export format anything but a simple dump format. I would not recommend to depend on the file approach for queueing just like Tigran mentions here. That might be its own spec applied to sqlite, redis and kafka.

@GrahamLea
Copy link
Contributor

@sirianni - I'm interested in the ResourceSpan-splitting Collector Processor that you mentioned above.
Did you ever end up contributing or open sourcing it anywhere?

@sirianni
Copy link
Contributor

sirianni commented Jul 1, 2022 via email

@GrahamLea
Copy link
Contributor

Ah, send_batch_max_size=1. Very clever. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed spec:protocol Related to the specification/protocol directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.