Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filelog receiver failing with High memory usage error and never recovers after that #29330

Closed
parepallykiran opened this issue Nov 17, 2023 · 4 comments
Assignees
Labels
bug Something isn't working receiver/filelog Stale

Comments

@parepallykiran
Copy link

Component(s)

receiver/filelog

What happened?

Description

While running OTEL collector as a pod on K8 cluster with limited memory allocation of 150 MB (Memory limiter, limit_mib set to 70). If we stop and restart OTEL Collector while it is processing files and after restart if there is more data than allocated memory which receiver can handle in a single poll, receiver is failing with below error and not recovering after that.

2023-11-16T22:20:27.703Z	error	consumerretry/logs.go:87	Max elapsed time expired. Dropping data.	{"kind": "receiver", "name": "filelog/log", "data_type": "logs", "error": "data refused due to high memory usage", "dropped_items": 100}
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/consumerretry.(*logsConsumer).ConsumeLogs
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/consumerretry/logs.go:87
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/adapter/receiver.go:125
2023-11-16T22:20:27.703Z	error	adapter/receiver.go:127	ConsumeLogs() failed	{"kind": "receiver", "name": "filelog/log", "data_type": "logs", "error": "data refused due to high memory usage"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/adapter/receiver.go:127

Steps to Reproduce

  • Start OTEL Col with limited memory resources like 100MB.
    • Make sure to enable persistent checkpointing, so that Collector will resume from where it has paused.
    • Set max_concurrent_files to 2
    • Set max_batches to 1
  • Let it read files for sometime and then stop the process.
  • Append an existing file with 150 MB of data.
  • Start back the OTEL Col process.

Expected Result

  • Expected receiver to poll for max amount of data which can be handled from a given file and process chuncks of data once starts back. Even if receiver runs into memory issues, recover from that failure.

Actual Result

  • Collector is polling for all available data from a single file after restart and failing if more data is available than what can be handled in memory.

Collector version

v0.89.0

Environment information

Environment

OS: alpine

OpenTelemetry Collector configuration

No response

Log output

2023-11-16T22:20:27.703Z	error	consumerretry/logs.go:87	Max elapsed time expired. Dropping data.	{"kind": "receiver", "name": "filelog/log", "data_type": "logs", "error": "data refused due to high memory usage", "dropped_items": 100}
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/consumerretry.(*logsConsumer).ConsumeLogs
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/consumerretry/logs.go:87
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/adapter/receiver.go:125
2023-11-16T22:20:27.703Z	error	adapter/receiver.go:127	ConsumeLogs() failed	{"kind": "receiver", "name": "filelog/log", "data_type": "logs", "error": "data refused due to high memory usage"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/adapter.(*receiver).consumerLoop
	/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/adapter/receiver.go:127

Additional context

No response

@parepallykiran parepallykiran added bug Something isn't working needs triage New item requiring triage labels Nov 17, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

@parepallykiran Thanks for reporting! I think the issue here is with the memorylimiter processor not allowing any more data. The filelog receiver scans the data line-by-line, not at once. Can you share your config?
I will try to reproduce this and let you know!

Also, when you say "150MB", do you mean 150MB of one single line? or multiple lines accounting for 150MB cumulative.

@djaglowski djaglowski removed the needs triage New item requiring triage label Nov 27, 2023
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 29, 2024
@djaglowski
Copy link
Member

Closing as we have had no response since November.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/filelog Stale
Projects
None yet
Development

No branches or pull requests

3 participants