Define how OTLP data can be stored in a file #1443

tigrannajaryan · 2021-02-16T15:51:18Z

OTLP currently defines how telemetry can be serialized and transferred over the network.

There are use cases that require writing telemetry to a file and then reading it back, e.g. storing test data or ability to transfer telemetry from DMZ that only allows file transfers: https://github.com/open-telemetry/opentelemetry-collector/issues/2450

We need to define how OTLP data can be serialized and stored in a file. We should allow to store multiple batches of telemetry, then read it back. All of the telemetry signals should be supported. The file format should be optimized for append scenarios and for sequential reading scenario.

The stored telemetry should contain metadata that describe what signal type it is. It should be possible to store telemetry of mixed signals in one file unambiguously.

Optional requirements:

Ability for random access of records.
Compression of stored records.
Support for both binary (e.g. Protobuf) and human-readable formats (e.g. JSON).

Note that experimental file exporter exists in the Collector: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/fileexporter that outputs records in OTLP/JSON format, however it lacks the metadata that tells the signal type and thus cannot be used in mixed signal scenarios.

hdost · 2021-02-16T18:05:48Z

This seems like it should be moved to open-telemetry/opentelemetry-proto. As it's literally a data model question.

bogdandrutu · 2021-02-16T19:34:05Z

@tigrannajaryan let's start with the support for binary/protobuf since that is marked as stable, then we can add/fix the support for JSON when that will be marked as stable.

Protobuf defines some of these things here:
https://developers.google.com/protocol-buffers/docs/techniques?csw=1#streaming

It provides a helpers as writeDelimitedTo:
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite.html#writeDelimitedTo-java.io.OutputStream-

tigrannajaryan · 2021-02-16T19:47:25Z

This seems like it should be moved to open-telemetry/opentelemetry-proto. As it's literally a data model question.

Historically this repo contains the OTLP specification, although the data structures are defined in the proto repo.

tigrannajaryan · 2021-02-16T20:24:26Z

let's start with the support for binary/protobuf since that is marked as stable, then we can add/fix the support for JSON when that will be marked as stable.

I agree.

Protobuf defines some of these things here:
https://developers.google.com/protocol-buffers/docs/techniques?csw=1#streaming

Protobuf's recommendation is only about the size / delimiting. I think it would be very nice to have a way to store mixed signals in one file, which means either we also have to store the type of the message or define a new message type which is oneof traces, metrics or logs.

Either way I think it will be very useful to spend a bit time to think through this properly and define a format that can evolve over time even if it only handles basic use cases for now. We can have a simple per-file header (e.g. with a signature and a version, possibly also compression method), and a per-message header (with message size, maybe also message type), both with some reserved bits for future expansion. Alternatively we re-use some existing container format instead of inventing our own.

I would also like to understand if we make this also work as a framing protocol over arbitrary continuous byte streams that will be an added bonus (we will get OTLP/TCP for free :-) ). Probably not, since the constraints are different, but it is still a useful thought exercise.

bogdandrutu · 2021-02-16T21:42:43Z

@tigrannajaryan I think we can start by agreeing on the goals first.

Do we need support for multiple telemetry signals in the same file? Personally I would say no, since we don't offer this in any of the other places, gRPC, Kafka, etc. Why is this a special case than other places where we don't? If we do here should we do it everywhere?
Support for both binary (e.g. Protobuf) and human-readable formats (e.g. JSON). Is this in the same file?
Ability for random access of records. Clarify on this, do we need to build an index? Or just the ability to read the size as protobuf recommends now is good enough?
Compression of stored records. Do we need to encode this in the file's metadata? What are the properties for a compressed file? Do we need random access for compress files?

tigrannajaryan · 2021-02-16T23:26:18Z

@bogdandrutu Good questions. Some thoughts below.

Do we need support for multiple telemetry signals in the same file? Personally I would say no, since we don't offer this in any of the other places, gRPC, Kafka, etc. Why is this a special case than other places where we don't? If we do here should we do it everywhere?

We do support it in OTLP/gRPC. OTLP network protocol includes the mechanism that allows to distinguish the requests for signals. It is part of the protocol. We don't tell the user: "you need to figure out yourself how to send traces and metrics so that they don't end up hitting the same port on the receiver and receiver will not understand it because it does not know what you sent".

We rely on the type-based demultiplexing feature of gRPC. We also have clearly defined URL-based demultiplexing in OTLP/HTTP.

I think we need a similar thing for files otherwise we are telling the user "it is on you to decide how to differentiate files that contain different signal types and if you accidentally mix them when reading it is going to result in decoding errors".

If we don't have this builtin in the format then the user has to make this decisions (e.g. use filename as the indicator of the signal type) and it is going to be fragile and error prone.

You are right that we don't support it in Kafka exporter and I think that is also a mistake.

IMO, this is important to have to make sure your data can be read without additional information coming from elsewhere (the knowledge that this file contains traces).

Support for both binary (e.g. Protobuf) and human-readable formats (e.g. JSON). Is this in the same file?

Not a strong opinion but it can be probably be different files (different formats altogether).

Ability for random access of records. Clarify on this, do we need to build an index? Or just the ability to read the size as protobuf recommends now is good enough?

I think random access is not necessary, but I wanted to list it to open a discussion. I cannot think of a use-case where it is important. If someone truly needed this they can store OTLP Protobufs in sqlite file database as individual records.

Compression of stored records. Do we need to encode this in the file's metadata? What are the properties for a compressed file? Do we need random access for compress files?

It is either a compression method per file or per record. If it is per file then it goes to file header otherwise it has to be repeated per record. However, I do not see why it is needed per record. It's not like we need different compression methods per record.

anuraaga · 2021-02-17T03:06:17Z

I think we need to define what a record is (which could apply to message exporters like Kafka too). Is a record a span, a trace, or just arbitrary resource-spans? If a span or trace, span could potentially be resource-spans, but it seems a bit shoehorned to only have a single span by convention. For a trace, I guess resource-spans doesn't work anyways.

In practice, I think for storage, either span or trace is more useful than arbitrary resource-spans.

tigrannajaryan · 2021-02-17T15:03:18Z

For the use cases listed above it is beneficial to be able to store full ResourceSpans, ResourceMetrics, ResourceLogs messages defined in the proto.

There may be use cases when you may want to store just individual spans, etc, but I think that needs to be considered separately. Also nothing prevents from storing a single Span in ResourceSpans, it is not terribly inefficient even in that case, the overhead will be small. You can also store a trace in ResouceSpans, we have no way of knowing whether it is a full trace or just a collection of spans unless we start looking into trace ids, which I think is out of the scope for this issue.

kumoroku · 2021-02-17T16:26:16Z

maybe a naive question, but how would this relate to buffering to disk when exporters can't talk to their destinations? completely orthogonal, or would we use the same format? @pmm-sumo

jsuereth · 2021-02-17T18:41:44Z

I agree w/ @bogdandrutu that we might want to identify use cases for supproted file formats.

As @kumoroku points out, buffering to disk would be important and POSSIBLY requires random-access / indexing by time. It gets fuzzy if you're bundling similar to OTLP when/what data you drop if you need to treat everything as a bundle of Resource*s.

tigrannajaryan · 2021-02-17T22:05:40Z

We had a discussion today in the Log SIG.

The buffering use case is different and yes, @jsuereth you are right has different requirements. At the minimum it requires a way to delete already consumed data (since it is a queue). In the past I used sqlite as a container for on disk buffers in a log collector and it worked very well, allowed to choose performance vs durability (since fsync is controllable). However, sqlite is likely an overkill for simple cases when you want to store OTLP data in a file.

I am not sure we can serve all these different use cases with a single solution. The on disk buffer also likely has more relaxed format stability guarantees (it is more acceptable to loose a disk buffer when upgrading a Collector version, than to render a previously written OTLP file unreadable).

So, there is possibly a need for several different solutions targeted at different use cases.

rakyll · 2021-02-22T21:56:02Z

What about setting string encoding rules initially? Storage requirements are use-case specific and will very potentially need custom exporters. But converting to JSON, it's not always clear how binary fields should be serialized and currently it's up to the exporters. Would it make sense to begin this work by setting the JSON encoding rules and making them a part of the spec?

tigrannajaryan · 2021-02-23T18:54:31Z

@rakyll there is an "Alpha" spec for OTLP/JSON, see https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlphttp-request

JSON-encoded Protobuf payloads use proto3 standard defined JSON Mapping for mapping between Protobuf and JSON, with one deviation from that mapping: the trace_id and span_id byte arrays are represented as case-insensitive hex-encoded strings, they are not base64-encoded like it is defined in the standard JSON Mapping. The hex encoding is used for trace_id and span_id fields in all OTLP Protobuf messages, e.g. the Span, Link, LogRecord, etc. messages.

We also already have implementations of this format in the Collector's OTLP/HTTP receiver and exporter.

tigrannajaryan · 2021-03-03T14:51:40Z

The JSON format depends on open-telemetry/opentelemetry-proto#268

sirianni · 2021-03-31T15:47:13Z

Also nothing prevents from storing a single Span in ResourceSpans, it is not terribly inefficient even in that case, the overhead will be small.

FWIW we are doing this internally at Confluent. Many sink connectors in the Kafka Connect ecosystem require a 1-1 relationship between a Kafka message and a row in the data store (BigQuery, Elasticsearch, etc.). By having a single metric/span in the record, this allows us to use those connectors to write OTel data to those data stores. We've implemented this by writing a OTel Collector processor that splits incoming OTLP protos into one-per metric/span. Happy to contribute this upstream if there is interest.

alexvanboxel · 2021-05-18T05:53:04Z

A solution for a file format would be to use the CloudEvents spec. It lists a lot of mapping for different formats (JSON, Avro, Protobuf) and different bindings (Kafka, MQTT, Pubsub, ...) of events.

I see 4 different combinations for files (from most compact to most verbose):

Avro File with embedded otlp_proto
Avro File with embedded otlp_json
JSON Line File with embedded otlp_proto (encoded as base64 in data)
JSON Line File with embedded otlp_json (inline as data)

It may be counter-intuitive to embed a Protobuf file in an Avro file but it fits nicely into the CloudEvent concept, as the format is only a container (it can embed anything). The spec only mandates id, source, specversion, and type. For the OTLP spec we could add another required attribute datacontenttype.

type could be for example: org.opentelemetry.otlp.traces.v1, org.opentelemetry.otlp.metrics.v1, and org.opentelemetry.otlp.logs.v1.

atoulme · 2021-12-21T22:28:52Z

Taking from this conversation and using a very restrictive filter to pick up the minimum of requirements, here is a PR with the serialization format I came up with: #2235

atoulme · 2021-12-21T22:31:31Z

Unaddressed requirements: buffering to disk, and most performance improvements around making this export format anything but a simple dump format. I would not recommend to depend on the file approach for queueing just like Tigran mentions here. That might be its own spec applied to sqlite, redis and kafka.

GrahamLea · 2022-07-01T13:33:03Z

@sirianni - I'm interested in the ResourceSpan-splitting Collector Processor that you mentioned above.
Did you ever end up contributing or open sourcing it anywhere?

sirianni · 2022-07-01T13:42:56Z

We subsequently realized that we could accomplish this by using the batch processor with a batch size of 1.

…

On Fri, Jul 1, 2022 at 9:33 AM Graham Lea ***@***.***> wrote: @sirianni <https://github.com/sirianni> - I'm interested in the ResourceSpan-splitting Collector Processor that you mentioned above. Did you ever end up contributing or open sourcing it anywhere? — Reply to this email directly, view it on GitHub <#1443 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAC5C5X6CJEDQAWSFS577RTVR3XRZANCNFSM4XWV3VXA> . You are receiving this because you were mentioned.Message ID: ***@***.*** com>

GrahamLea · 2022-07-01T14:00:09Z

Ah, send_batch_max_size=1. Very clever. Thanks!

tigrannajaryan added the spec:protocol Related to the specification/protocol directory label Feb 16, 2021

pmm-sumo mentioned this issue Feb 17, 2021

Improving resiliency through disk buffering capability open-telemetry/opentelemetry-collector#2285

Closed

tigrannajaryan mentioned this issue Mar 3, 2021

Add OTLP/JSON-format file log collection support. open-telemetry/opentelemetry-collector-contrib#2537

Closed

tigrannajaryan added the help wanted Extra attention is needed label Mar 16, 2021

tigrannajaryan mentioned this issue Mar 17, 2021

Maturity for OTLP/JSON traces open-telemetry/opentelemetry-proto#230

Closed

anuraaga mentioned this issue Mar 24, 2021

Add an S3 exporter open-telemetry/opentelemetry-collector-contrib#2835

Closed

tigrannajaryan mentioned this issue May 17, 2021

OTLP over message systems open-telemetry/oteps#157

Open

tigrannajaryan mentioned this issue Sep 17, 2021

Standardize the name of logging/stdout exporter #1931

Closed

This was referenced Nov 15, 2021

[DRAFT] Add filelogexporter. open-telemetry/opentelemetry-collector-contrib#6306

Closed

Add file log exporter open-telemetry/opentelemetry-collector-contrib#6316

Closed

tigrannajaryan mentioned this issue Dec 14, 2021

[receiver/file] Initial implementation open-telemetry/opentelemetry-collector-contrib#6712

Closed

atoulme mentioned this issue Dec 21, 2021

JSON serialization format #2235

Merged

carlosalberto closed this as completed in #2235 Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define how OTLP data can be stored in a file #1443

Define how OTLP data can be stored in a file #1443

tigrannajaryan commented Feb 16, 2021 •

edited

Loading

hdost commented Feb 16, 2021 •

edited

Loading

bogdandrutu commented Feb 16, 2021

tigrannajaryan commented Feb 16, 2021

tigrannajaryan commented Feb 16, 2021

bogdandrutu commented Feb 16, 2021

tigrannajaryan commented Feb 16, 2021

anuraaga commented Feb 17, 2021 •

edited

Loading

tigrannajaryan commented Feb 17, 2021

kumoroku commented Feb 17, 2021

jsuereth commented Feb 17, 2021

tigrannajaryan commented Feb 17, 2021 •

edited

Loading

rakyll commented Feb 22, 2021

tigrannajaryan commented Feb 23, 2021

tigrannajaryan commented Mar 3, 2021

sirianni commented Mar 31, 2021

alexvanboxel commented May 18, 2021

atoulme commented Dec 21, 2021

atoulme commented Dec 21, 2021

GrahamLea commented Jul 1, 2022

sirianni commented Jul 1, 2022 via email

GrahamLea commented Jul 1, 2022

Define how OTLP data can be stored in a file #1443

Define how OTLP data can be stored in a file #1443

Comments

tigrannajaryan commented Feb 16, 2021 • edited Loading

hdost commented Feb 16, 2021 • edited Loading

bogdandrutu commented Feb 16, 2021

tigrannajaryan commented Feb 16, 2021

tigrannajaryan commented Feb 16, 2021

bogdandrutu commented Feb 16, 2021

tigrannajaryan commented Feb 16, 2021

anuraaga commented Feb 17, 2021 • edited Loading

tigrannajaryan commented Feb 17, 2021

kumoroku commented Feb 17, 2021

jsuereth commented Feb 17, 2021

tigrannajaryan commented Feb 17, 2021 • edited Loading

rakyll commented Feb 22, 2021

tigrannajaryan commented Feb 23, 2021

tigrannajaryan commented Mar 3, 2021

sirianni commented Mar 31, 2021

alexvanboxel commented May 18, 2021

atoulme commented Dec 21, 2021

atoulme commented Dec 21, 2021

GrahamLea commented Jul 1, 2022

sirianni commented Jul 1, 2022 via email

GrahamLea commented Jul 1, 2022

tigrannajaryan commented Feb 16, 2021 •

edited

Loading

hdost commented Feb 16, 2021 •

edited

Loading

anuraaga commented Feb 17, 2021 •

edited

Loading

tigrannajaryan commented Feb 17, 2021 •

edited

Loading