You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Falcosidekick is encountering frequent crashes with a fatal error: concurrent map iteration and map write. After updating Falcosidekick to run with a multi-replica configuration (replicaCount=3), enabling buffered output (falco.buffered_outputs=true), and modifying rate and burst limits (rate: 10, burst: 20), failures are still observed
Initially, the WebUI output was enabled, which caused instability. After disabling the WebUI output, there was some improvement in stability, but the application continues to crash when handling events.
Errors logged during operation:
fatal error: concurrent map iteration and map write
How to reproduce it
Deploy Falcosidekick with the following configuration:
replicaCount=3
falco.buffered_outputs=true
rate: 10
burst: 20
Enable Slack and Elasticsearch outputs.
Disable the WebUI output.
Trigger multiple Falco events to observe Falcosidekick crashes.
Expected behavior
Falcosidekick should remain stable and handle a high volume of Falco events in a multi-replica setup without crashing.
Environment
Falco version: 0.38.2
Falcosidekick version: 2.29.0
Kubernetes version: v1.28.11
System info:
Machine architecture: x86_64
Kernel: 6.1.0-23-amd64 (Debian 6.1.99-1, built on 2024-07-15)
Operating system: Debian GNU/Linux 11 (bullseye)
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
VERSION="11 (bullseye)"
Cloud provider: AWS EC2 t3.large instances
Installation method: Deployed via Helm
Additional context
Despite disabling the WebUI (which initially caused instability), the system continues to crash when forwarding events to Opensearch (using elasticsearch output) and Slack outputs. Buffered outputs have been enabled to optimize performance, but the issue persists across all replicas.
I did tests with a vanilla config.yaml and just Slack as enabled output (I used a mock server to avoid Slack's rate limiting) and I wasn't able to replicate the issue with the 2.29.0 and ~100 req/s (which is a ridiculous rate for security alerts in real life).
I did tests with a vanilla config.yaml and just Slack as enabled output (I used a mock server to avoid Slack's rate limiting) and I wasn't able to replicate the issue with the 2.29.0 and ~100 req/s (which is a ridiculous rate for security alerts in real life).
Hello again Thomas.
Let me know what more I can inform you of. I will say that looking at our elasticsearch I see this behavior of several minutes of 5-10 logs, then a few hundred thousand logs all at once. This makes no sense to me, as the only alerting rules I have apply to a cluster of 30 some nodes, and it is just the k8s audit rules and our own custom rule for ssh intrusion detection. There should not be this insane volume.
What can I help provide to help reproduce the error.
Describe the bug
Falcosidekick is encountering frequent crashes with a
fatal error: concurrent map iteration and map write
. After updating Falcosidekick to run with a multi-replica configuration (replicaCount=3
), enabling buffered output (falco.buffered_outputs=true
), and modifying rate and burst limits (rate: 10
,burst: 20
), failures are still observedInitially, the WebUI output was enabled, which caused instability. After disabling the WebUI output, there was some improvement in stability, but the application continues to crash when handling events.
Errors logged during operation:
How to reproduce it
replicaCount=3
falco.buffered_outputs=true
rate: 10
burst: 20
Expected behavior
Falcosidekick should remain stable and handle a high volume of Falco events in a multi-replica setup without crashing.
Environment
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
VERSION="11 (bullseye)"
t3.large
instancesAdditional context
Despite disabling the WebUI (which initially caused instability), the system continues to crash when forwarding events to Opensearch (using elasticsearch output) and Slack outputs. Buffered outputs have been enabled to optimize performance, but the issue persists across all replicas.
Stack traces and further details can be found in the log attached and in the original conversation with Thomas Labarussias:
https://kubernetes.slack.com/archives/CMWH3EH32/p1726712928741829
falcologs1.txt
The text was updated successfully, but these errors were encountered: