Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file log exporter #6316

Closed
hypnoce opened this issue Nov 15, 2021 · 12 comments
Closed

Add file log exporter #6316

hypnoce opened this issue Nov 15, 2021 · 12 comments

Comments

@hypnoce
Copy link
Contributor

hypnoce commented Nov 15, 2021

Is your feature request related to a problem? Please describe.
The current file exporter/receiver lacks many features that can make it usable in more scenarios than debug :

  • mirroring a log directory on another machine and enrich logs with resource data (podname, namespace...) (dynamic file names and counts)
  • ability to write all types (logs, metrics and spans) of telemetry data on disk and readback in scenarios like FileExporter file separation #5008 where logs are shipped to another location and ingested back into another OTEL pipeline.
  • Ability to archive all telemetry data on disk
  • Create directory per pod name

Further reading : #4997

Describe the solution you'd like
A file exporter flexible enough to cover many scenarios

Describe alternatives you've considered
Creating a simple filelog exporter #6306. No other solution worked in the otel ecosystem.

Requirements of file exporter

  • File location
    • the exporter is scoped to a directory where all files will be written. / . are valid directories.
    • each file path can be constructed based on context from the resource and the record (log record, span, metric) being written. This context can be extracted from the attributes (ex: ./logs/{namespace}/otel.{podname}.{original_file_name}. Or a single file can be statically defined per exporter.
  • File handling/lifecycle
    • Create/open is configurable : mode, truncate, create...
    • optional file size and age constraints
    • strategy to handle constraints breaking: rotate (with max number of rotation, compress backups...), discard record, erase file content...
  • File format/encoding
    • support of logs, metrics and spans
    • single encoding is supported per exporter. The encoding can be a generic one including, for each record, the type and version. Exemple of possible encoding : otel-json, otel-binary, body, escaped... Ability to support multiple encodings can be done using multiple exporters (and optionally routing processors).
    • Optional record constraints : max record size in bytes
    • Strategy to handle constraints breaking : discard, trim left/right/center...

I would decouple the file format specification in another issue. A formal, general, replayable file format/encoding can be used in other contexts like export to other storage like S3, kafka...

@tigrannajaryan
Copy link
Member

Let's make sure we approach this from the perspective of the actual real-world use cases.

mirroring a log directory on another machine/pod (dynamic file names and counts)

This does not seem to be a Collector job. There are other tools that can do this (e.g. rsync), not clear why it needs to be a Collector feature.

ability to write all types (logs, metrics and spans) of telemetry data on disk and readback in scenarios like FileExporter file separation #5008 where logs are shipped to another location and ingested back into another OTEL pipeline.
Ability to archive all telemetry data on disk

Both of these use-cases likely can be served with one standardized file format. These do not appear to require the file format to be customizable by the end user in any way. In fact it may be preferable that that the file format is not customizable so that mistakes are not possible. I believe if we want a standard file format then it is best to define it as part of the specification, for which we have an open issue open-telemetry/opentelemetry-specification#1443

each file path can be constructed based on context from the resource and the record (log record, span, metric) being written. This context can be extracted from the attributes. Or a single can be statically defined per exporter.

I do not see which of the listed use-cases require this. Perhaps add a use-case to justify or remove the requirement.

@hypnoce
Copy link
Contributor Author

hypnoce commented Nov 16, 2021

This does not seem to be a Collector job. There are other tools that can do this (e.g. rsync), not clear why it needs to be a Collector feature.

Rsync can indeed work but with lots of drawbacks (discovery of new dir/files not supported, tailing not supported...).
Some people have suggested

while true; do 
  inotifywait -r -e modify,create,delete /directory
  rsync -avz /directory /target
done

which adds big overhead when log files are often updated.

I still believe that it's a valid use case of a log collector, as it collects logs, ships them, and route them to files with configurable format. Multiplying log collection tooling and adding many side cars in a pod can increase operation complexity as well as resource requirements. It's an actual use case that I currently face.
For instance, using fluentbit tail, the file output using pattern and this formatter, I can construct a pipeline that mirrors a log directory. Using the same technology (fluentd) I can build more sophisticated pipelines thus serving many use cases.

I do not see which of the listed use-cases require this. Perhaps add a use-case to justify or remove the requirement.

It was in the case of the first use case where the target file name and location could not be determined at pipeline creation time. Another use case is to be able to write to different file/location based on the kubernetes pod_name that produced the data.

I believe if we want a standard file format then it is best to define it as part of the specification, for which we have an open issue

Agree.

@hypnoce
Copy link
Contributor Author

hypnoce commented Feb 15, 2022

@tigrannajaryan is there something I'm missing about rsync for my use case ?

@hypnoce hypnoce changed the title Add file exporter Add file log exporter Feb 15, 2022
@tigrannajaryan
Copy link
Member

@hypnoce there is currently #7840 and open-telemetry/opentelemetry-specification#2235 in progress, which may help to cover the use cases that you have. The Collector is about a receiver, but once the JSON format is standardized it should be easy to make an argument that we can also have an exporter of the same format. Review those 2 PRs, comment on them.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 8, 2022
@hypnoce
Copy link
Contributor Author

hypnoce commented Mar 14, 2023

Hey all,
my use case is a bit different. I need to be able to write logs in a configurable format in dynamic files based on resource attributes. Like output.info, output.warn and output.error files based on the severity with only the faulty line.
Like this operator but having the filename be a go template as well : https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/file_output.md
WDYT ?
Thanks

@github-actions github-actions bot removed the Stale label Apr 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jun 7, 2023
@remram44
Copy link

remram44 commented Jun 7, 2023

Please don't close it. Nice bot 😅

@github-actions github-actions bot removed the Stale label Jun 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Aug 7, 2023
@remram44
Copy link

remram44 commented Aug 7, 2023

Feature is still wanted

@github-actions github-actions bot removed the Stale label Aug 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Oct 9, 2023
Copy link
Contributor

github-actions bot commented Dec 8, 2023

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants