Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_kubernetes_metadata panics when a Beat shuts down #34219

Closed
cmacknz opened this issue Jan 9, 2023 · 7 comments · Fixed by #34647
Closed

add_kubernetes_metadata panics when a Beat shuts down #34219

cmacknz opened this issue Jan 9, 2023 · 7 comments · Fixed by #34647
Assignees
Labels
bug Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team Team:Elastic-Agent Label for the Agent team v8.6.0 v8.7.0

Comments

@cmacknz
Copy link
Member

cmacknz commented Jan 9, 2023

I've observed an occasional panic in the add_kubernetes_metadata processor where a channel can be closed twice. This typically occurs when the Beat is being shutdown. Here is a sample from the 8.6 Elastic agent logs.

{"log.level":"error","@timestamp":"2022-12-31T19:01:47.997Z","message":"Harvester crashed with: harvester panic with: close of closed channel\ngoroutine 230 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x64\ngithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1.1()\n\t/go/src/github.com/elastic/beats/filebeat/input/filestream/internal/input-logfile/harvester.go:167 +0x64\npanic({0x104366b00, 0x10492c890})\n\t/usr/local/go/src/runtime/panic.go:844 +0x26c\ngithub.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(0x14000697650)\n\t/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/cache.go:97 +0x30\ngithub.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0x140006a25c0)\n\t/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x84\ngithub.com/elastic/beats/v7/libbeat/processors.Close({0x104959438, 0x140006a25c0})\n\t/go/src/github.com/elastic/beats/libbeat/processors/processor.go:58 +0x8c\ngithub.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x140008f8120)\n\t/go/src/github.com/elastic/beats/libbeat/publisher/processing/processors.go:95 +0x11c\ngithub.com/elastic/beats/v7/libbeat/processors.Close({0x104959910, 0x140008f8120})\n\t/go/src/github.com/elastic/beats/libbeat/processors/processor.go:58 +0x8c\ngithub.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x140008f80f0)\n\t/go/src/github.com/elastic/beats/libbeat/publisher/processing/processors.go:95 +0x11c\ngithub.com/elastic/beats/v7/libbeat/processors.Close({0x104959910, 0x140008f80f0})\n\t/go/src/github.com/elastic/beats/libbeat/processors/processor.go:58 +0x8c\ngithub.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close.func1()\n\t/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:167 +0x278\nsync.(*Once).doSlow(0x14000bf7e20, 0x14000743740)\n\t/usr/local/go/src/sync/once.go:68 +0xf0\nsync.(*Once).Do(0x14000bf7e20, 0x14000743740)\n\t/usr/local/go/src/sync/once.go:59 +0x48\ngithub.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0x14000bf7dd0)\n\t/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:148 +0x80\ngithub.com/elastic/beats/v7/filebeat/beater.(*countingClient).Close(0x14000bbd0b0)\n\t/go/src/github.com/elastic/beats/filebeat/beater/channels.go:145 +0x40\ngithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1({0x104971ef8, 0x14000bb29c0})\n\t/go/src/github.com/elastic/beats/filebeat/input/filestream/internal/input-logfile/harvester.go:219 +0xb1c\ngithub.com/elastic/go-concert/unison.(*TaskGroup).Go.func1()\n\t/go/pkg/mod/github.com/elastic/[email protected]/unison/taskgroup.go:163 +0xec\ncreated by github.com/elastic/go-concert/unison.(*TaskGroup).Go\n\t/go/pkg/mod/github.com/elastic/[email protected]/unison/taskgroup.go:159 +0x130\n","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log.origin":{"file.line":168,"file.name":"input-logfile/harvester.go"},"id":"filestream-monitoring-agent","ecs.version":"1.6.0","log.logger":"input.filestream"}

Here is a more nicely formatted callstack:

Harvester crashed with: harvester panic with: close of closed channelgoroutine 230 [running]:\n
runtime/debug.Stack()\n\t
/usr/local/go/src/runtime/debug/stack.go:24 +0x64\ngithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1.1()\n\t
/go/src/github.com/elastic/beats/filebeat/input/filestream/internal/input-logfile/harvester.go:167 +0x64\n
panic({0x104366b00, 0x10492c890})\n\t
/usr/local/go/src/runtime/panic.go:844 +0x26c\n
github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(0x14000697650)\n\t
/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/cache.go:97 +0x30\n
github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0x140006a25c0)\n\t
/go/src/github.com/elastic/beats/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x84\ngithub.com/elastic/beats/v7/libbeat/processors.Close({0x104959438, 0x140006a25c0})\n\t
/go/src/github.com/elastic/beats/libbeat/processors/processor.go:58 +0x8c\n
github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x140008f8120)\n\t
/go/src/github.com/elastic/beats/libbeat/publisher/processing/processors.go:95 +0x11c\n
github.com/elastic/beats/v7/libbeat/processors.Close({0x104959910, 0x140008f8120})\n\t
/go/src/github.com/elastic/beats/libbeat/processors/processor.go:58 +0x8c\n
github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x140008f80f0)\n\t
/go/src/github.com/elastic/beats/libbeat/publisher/processing/processors.go:95 +0x11c\n
github.com/elastic/beats/v7/libbeat/processors.Close({0x104959910, 0x140008f80f0})\n\t
/go/src/github.com/elastic/beats/libbeat/processors/processor.go:58 +0x8c\n
github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close.func1()\n\t
/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:167 +0x278\n
sync.(*Once).doSlow(0x14000bf7e20, 0x14000743740)\n\t/usr/local/go/src/sync/once.go:68 +0xf0\n
sync.(*Once).Do(0x14000bf7e20, 0x14000743740)\n\t
/usr/local/go/src/sync/once.go:59 +0x48\n
github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0x14000bf7dd0)\n\t/go/src/github.com/elastic/beats/libbeat/publisher/pipeline/client.go:148 +0x80\ngithub.com/elastic/beats/v7/filebeat/beater.(*countingClient).Close(0x14000bbd0b0)\n\t
/go/src/github.com/elastic/beats/filebeat/beater/channels.go:145 +0x40\ngithub.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.startHarvester.func1({0x104971ef8, 0x14000bb29c0})\n\t
/go/src/github.com/elastic/beats/filebeat/input/filestream/internal/input-logfile/harvester.go:219 +0xb1c\n
github.com/elastic/go-concert/unison.(*TaskGroup).Go.func1()\n\t
/go/pkg/mod/github.com/elastic/[email protected]/unison/taskgroup.go:163 +0xec\ncreated by github.com/elastic/go-concert/unison.(*TaskGroup).Go\n\t
/go/pkg/mod/github.com/elastic/[email protected]/unison/taskgroup.go:159 +0x130
@cmacknz cmacknz added bug Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team v8.6.0 Team:Elastic-Agent Label for the Agent team labels Jan 9, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@cmacknz cmacknz removed the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jan 11, 2023
@cmacknz
Copy link
Member Author

cmacknz commented Jan 11, 2023

This happens even when not running on Kubernetes. It's not clear if filestream or the add_metadata_processor is at fault here yet.

@lucabelluccini
Copy link
Contributor

Hello @cmacknz do we agree this should have no impact except being noisy at shutdown/restarts of the agents?

@cmacknz
Copy link
Member Author

cmacknz commented Feb 2, 2023

Yes, no known impact besides noise in the logs on shutdown.

@andrewkroh
Copy link
Member

This add_kubernetes_metadata.(*cache).stop(...) panic has been observed with the Filebeat aws-s3 input under Agent while using it in the SQS mode.

According to users on Discuss it has a high impact because it does not delete received messages.

"I am able to see ELB logs in Elasticsearch cluster but my SQS inflight messages are keep on increasing. Looks like agent is not able to delete all the messages."

The input uses a unique Beat pipeline.Client to process each SQS message and hence it closes pipeline.Clients frequently (separate pipeline.Clients prevent mutex contention between sqs/s3 readers).

# From Agent 8.6.1 logs.
% pbpaste| jq -r .message
sync.(*Once).Do(...)
sync/once.go:59
github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*client).Close(0x55d9bdaaf7e0?)
github.com/elastic/beats/v7/libbeat/publisher/pipeline/client.go:148 +0x59
github.com/elastic/beats/v7/filebeat/beater.(*countingClient).Close(0xc000d764b0?)
github.com/elastic/beats/v7/filebeat/beater/channels.go:145 +0x22
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).processS3Events(0xc0001b7340, {0x55d9bbb86f98, 0xc000984fc0}, 0xc0016059c0, {0xc000e91400?, 0x0?})
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs_s3_event.go:329 +0x811
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsS3EventProcessor).ProcessSQS(0xc0001b7340, {0x55d9bbb86f98, 0xc000984fc0}, 0xc00103e180)
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs_s3_event.go:138 +0x333
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive.func1({0xc000cbb0e0, 0xc000d7e6a0, 0xc000d7e690, 0x0, 0x0, 0xc000d7e670, 0xc000d7e680, {}}, {0xc0f2a0b38f07cb57, 0x34689750ef, ...})
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs.go:91 +0x1bb
created by github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive
github.com/elastic/beats/v7/x-pack/filebeat/input/awss3/sqs.go:83 +0x5a5
panic: close of closed channel
goroutine 185 [running]:
github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*cache).stop(...)
github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/cache.go:97
github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata.(*kubernetesAnnotator).Close(0xc000140780?)
github.com/elastic/beats/v7/libbeat/processors/add_kubernetes_metadata/kubernetes.go:311 +0x4f
github.com/elastic/beats/v7/libbeat/processors.Close(...)
github.com/elastic/beats/v7/libbeat/processors/processor.go:58
github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x7f47458a7fff?)
github.com/elastic/beats/v7/libbeat/publisher/processing/processors.go:95 +0x159
github.com/elastic/beats/v7/libbeat/processors.Close(...)
github.com/elastic/beats/v7/libbeat/processors/processor.go:58
github.com/elastic/beats/v7/libbeat/publisher/processing.(*group).Close(0x0?)

@andrewkroh
Copy link
Member

Duplicates: #32200

@rdner
Copy link
Member

rdner commented Mar 7, 2023

The actual fix is in this PR #34761

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team Team:Elastic-Agent Label for the Agent team v8.6.0 v8.7.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants