Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] Ability to control final document structure for logs #35444

Closed
mauri870 opened this issue Sep 26, 2024 · 13 comments · Fixed by #35637
Closed

[exporter/elasticsearch] Ability to control final document structure for logs #35444

mauri870 opened this issue Sep 26, 2024 · 13 comments · Fixed by #35637
Labels

Comments

@mauri870
Copy link
Contributor

mauri870 commented Sep 26, 2024

Component(s)

exporter/elasticsearch

Is your feature request related to a problem? Please describe.

At Elastic we are working on transitioning Beats to be OTel receivers. During this migration we decided that we want to forward structured beats events in the LogRecord body. This way processors can interact with the body(beats event) as they see fit.

We need to preserve the structure and fields that comes from the body and use that as the final document that is persisted in Elasticsearch, without any decoration or envelope added by the es exporter. In summary, the receiver and processors in the pipeline already aligned the structure in the body of the log record and we want the exporter to act as a passthrough for the body data, converting it to JSON, which will then be ingested directly into Elasticsearch.

Currently, there are different supported mapping modes, but none offer this level of flexibility in the output structure.

Describe the solution you'd like

Essentially, we want the exporter to take the body of each LogRecord as a map and convert it directly into a separate document for Elasticsearch. This assumes that the receivers or processors earlier in the pipeline have already prepared the body with all the fields that will appear in the final document. The Elasticsearch exporter will then function as a passthrough, simply moving each LogRecord body into Elasticsearch as its own document without any modifications.

Describe alternatives you've considered

I'd love to hear ideas on how to support this use case, but we've though of some approches for this.

  1. Support the encoding extension framework in the elasticsearchexporter

This looks promising, I worked on a PoC using the jsonlogencodingextension and it does exactly what we need. It parses the LogRecord body, converts the map into a json and MarshalLogs returns a json-serialized byte slice of the result.

Unfortunately it has some caveats. The jsonlogencodingextension only serializes the first log record body . This means that if all the log records are inside of a single plog.Logs it will only serialize the first LogRecord, which will not work for us. We need each LogRecord to be a separate document at the end. We can cheat a bit in order to marshal a single LogRecord:

func (e *elasticsearchExporter) marshalLog(_ context.Context, record plog.LogRecord) ([]byte, error) {
	export := plog.NewLogs()
	exportrls := export.ResourceLogs().AppendEmpty()
	exportsl := exportrls.ScopeLogs().AppendEmpty()
	exportlogs := exportsl.LogRecords().AppendEmpty()
	record.CopyTo(exportlogs)
	return e.marshaller.MarshalLogs(export)
}

This works, but it is a hack of sorts. If we try to plug any other encoding extension it will work just fine but the output might not be what you expect. For example the otlp_json encoding, the user will likely expect a plog.Logs to become a single entry with an array of LogRecords, and not separate documents. This quirk of the jsonencoder is questioned here. For us it is exactly what we need, but the behavior seems 'strange' for use with other encodings.

  1. Support for a jsonbody mapping mode

This mapping mode would basically take the body of a LogRecord as a map, serialize it into json and that would be the final document to be ingested into Elasticsearch. This solution seems more straightforward and simple, but it does not benefit the otel ecosystem like the push for encoding support does.

Additional context

For context, we have logs similar to this:

logs := plog.NewLogs()
resourceLogs := logs.ResourceLogs().AppendEmpty()
scopeLogs := resourceLogs.ScopeLogs().AppendEmpty()
logRecords := scopeLogs.LogRecords()
logRecord := logRecords.AppendEmpty()

// custom fields
bodyMap := pcommon.NewMap()
bodyMap.PutStr("@timestamp", "1754-08-30T22:43:41.128654848Z")
bodyMap.PutInt("id", 0)
bodyMap.PutStr("key", "value")

bodyMap.CopyTo(logRecord.Body().SetEmptyMap())

With the current es exporter and mapping type none the final document that is send to Elasticsearch looks like this:

{
   "@timestamp":"1754-08-30T22:43:41.128654848Z", // This is not the timestamp from the log record body
   "Body":{
      "@timestamp":"1754-08-30T22:43:41.128654848Z",
      "id":1,
      "key":"value"
   },
   "Scope":{
      "name":"",
      "version":""
   },
   "SeverityNumber":0,
   "TraceFlags":0
}

We would like for it to be:

{
   "@timestamp":"1754-08-30T22:43:41.128654848Z",
   "id": 1,
   "key": "value"
}
@mauri870 mauri870 added enhancement New feature or request needs triage New item requiring triage labels Sep 26, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mauri870 mauri870 changed the title Ability to control final document structure for logs [exporter/elasticsearch] Ability to control final document structure for logs Sep 26, 2024
@carsonip
Copy link
Contributor

carsonip commented Oct 2, 2024

Questions:

  • Will the body be a string or a map?
  • ES exporter functionality can be largely divided into 3 areas: mapping and encoding (controls the final document structure), routing (dynamic routing to different indices based on attributes), indexing (batching, sending bulk requests, retries). The proposed beats passthrough will basically bypass mapping and go straight to indexing, but do you need the functionality to dynamic route based on attributes?

@mauri870
Copy link
Contributor Author

mauri870 commented Oct 2, 2024

Will the body be a string or a map?

For our use case it will always be a map. I think this is wise in general because we can properly encode the type information we need in the map, that helps to serializing this data properly at the end of the pipeline. Also processors can be added and they can interact with this map more easily.

The proposed beats passthrough will basically bypass mapping and go straight to indexing, but do you need the functionality to dynamic route based on attributes?

Not at this time. It seems like it could be useful, though. I'm not sure we should dismiss the possibility of supporting it if the effort isn't too great.

@felixbarny
Copy link
Contributor

I think routing to a specific data stream will be important for the Beats event data passthrough mode. You'll want to be able to define to which data stream the event is sent. IINM, you even want to express metrics as a log record that will then be routed to metric data streams. So I think the attribute-based routing is still very relevant.

@carsonip
Copy link
Contributor

carsonip commented Oct 2, 2024

/label -needs-triage

@github-actions github-actions bot removed the needs triage New item requiring triage label Oct 2, 2024
@carsonip
Copy link
Contributor

carsonip commented Oct 2, 2024

For our use case it will always be a map. I think this is wise in general because we can properly encode the type information we need in the map, that helps to serializing this data properly at the end of the pipeline. Also processors can be added and they can interact with this map more easily.

Got it. So the beats passthrough mapping mode will use attributes to route, but the document (payload to ES) will be the exactly the encoded version of body without regard to other fields in the otel LogRecord data structure.

What about dedot and dedup? Will the body map be in a structure that is already dedotted and deduplicated, such that dedot and dedup in es exporter can be bypassed and map is a direct transaction to the resulting document?

@mauri870
Copy link
Contributor Author

mauri870 commented Oct 2, 2024

Got it. So the beats passthrough mapping mode will use attributes to route, but the document (payload to ES) will be the exactly the encoded version of body without regard to other fields in the otel LogRecord data structure.

That is correct.

What about dedot and dedup? Will the body map be in a structure that is already dedotted and deduplicated, such that dedot and dedup in es exporter can be bypassed and map is a direct transaction to the resulting document?

I spoke with the team, and we don't require support for dedot, dedup, or any transformation of the body. The final document must match the exact structure of the body of the LogRecord.

@carsonip
Copy link
Contributor

carsonip commented Oct 2, 2024

Sounds good, I imagine that's not too hard to accomplish.

@felixbarny
Copy link
Contributor

I guess the remaining question is on having a new mapping mode specific to the ES exporter or to somehow integrate with the encoder extensions.

Do you have thoughts on that, @carsonip?

@carsonip
Copy link
Contributor

carsonip commented Oct 2, 2024

I briefly looked at the encoder extensions and the current usages of them in exporters e.g. fileexporter

  • jsonlogencodingextension would definitely not work out of the box as mentioned, due to the limitation of only processing the first log record.
  • imo the encoder extensions interface to convert a plog.Logs that may contain multiple log records to a single []byte does not match our abstraction. We will need 1 json per log record, not to mention they may need to be routed dynamically to different indices.
  • The workaround to split every log record into its own plog.Logs would need to sit in es exporter, and will need to be written well to avoid copying, e.g. keep shifting the log records and scope logs instead of creating copies. This workaround will have to sit inside es exporter code.
  • In theory adopting encoding extension would enable users to encode them in whatever way they like, with a custom encoding extension.
  • Now data stream routing is done with data_stream.* fields injection. Meaning that if we use data stream routing, body will be 100% translated to resulting json with the exception of data stream fields. Otherwise data stream routing will need to be handled before es exporter.

@mauri870
Copy link
Contributor Author

mauri870 commented Oct 4, 2024

IMHO we don't have enough info on how to support the encoding extension properly, specifically because of the differences between the jsonlogencodingextension and the other extensions that require additional logic to be implemented. We are blocked in making progress with the EDOT until we can figure out how to support the use case described in this issue.

Wdyt of going with a more conservative approach with a new mapping mode, and in the meantime we can look into how to support the encoding extension properly? I have a PoC here that can serve as an initial implementation.

@carsonip
Copy link
Contributor

carsonip commented Oct 4, 2024

sgtm. A new mapping mode will be fairly straightforward to implement. Marking it as experimental will be fine.

@mauri870
Copy link
Contributor Author

mauri870 commented Oct 7, 2024

Thanks. I have submitted a pull request with the implementation.

andrzej-stencel pushed a commit that referenced this issue Oct 17, 2024
#### Description

This PR implements a new mapping mode `bodymap` that works by
serializing each LogRecord body as-is into a separate document for
ingestion.

Fixes #35444

#### Testing

#### Documentation

---------

Co-authored-by: Carson Ip <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants