Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing event metadata #3246

Open
imreczegledi-form3 opened this issue Jun 12, 2024 · 3 comments
Open

Missing event metadata #3246

imreczegledi-form3 opened this issue Jun 12, 2024 · 3 comments

Comments

@imreczegledi-form3
Copy link

imreczegledi-form3 commented Jun 12, 2024

Hi 👋

We have some false positive alerts on empty events, similar to #3234, #2700 (hope I can help in these cases as well)

Missing event metadata

  • almost everything is null, -1 or 4294967295
{"hostname":"minikube","output":"12:15:24.058348969: Warning Account Manipulation in SSH detected 
...
{"container.id":"host","container.image.repository":null,"container.image.tag":null,"container.name":"host","evt.res":"SUCCESS","evt.time":1718108124058348969,"evt.type":"openat","fd.name":"my_sshd_config","group.gid":4294967295,"group.name":"","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"bash","proc.cwd":"","proc.exepath":"","proc.pcmdline":null,"proc.pid":11453,"proc.ppid":0,"proc.sid":-1,"user.loginname":"","user.loginuid":-1,"user.name":"","user.uid":4294967295}
...
}

Falco rule: Account Manipulation in SSH, but the issue is not rule specific.

Based on my local tests, the root cause is the too small bufSizePreset parameter. This buffer is crucial when Falco has to a handle a "process flood" (e.g. a process makes hundreds of child processes).

To simulate a "process flood" I created a small golang script which triggers the rule 1000 times in different child processes (on the host).

...
cmd = exec.CommandContext(ctx, "timeout", "5s", "tail", "-f", "/home/ubuntu/my_sshd_config")
...

This is the way how you can reproduce the issue.

Test env

  • EC2 (Ubuntu, t3.small) with minikube
  • Deployed Falco chart version 4.3.0

Results

bufSizePreset Logged events Events with missing metadata Ratio
1 (1 MB) 447 77 0,172
2 (2 MB) 582 69 0,118
3 (3 MB) 570 48 0,084
4 (4 MB) - Falco default 738 41 0,055
5 (16 MB) 998 0 -

As you can see above as we increase the buffer, the number of the events without metadata is decreasing. When we use a buffer with appropriate size the issue disappears, Falco logs will contain only appropriately enriched events.

bufSizePreset can be between 1-10

Ideas

  • Probably, the event enrichment logic uses some space from the bufSizePreset buffer
  • Due to the huge load (because of the new processes) event enrichment doesn't have enough space to work
  • Maybe dropping these "empty" events would be better
  • A bufSizePreset specific debug message (with a logic which can measure the buffer utilisation) would be very useful

Looking forward to your answer, ideas (I might have missed something)

@incertum
Copy link
Contributor

As you can see above as we increase the buffer, the number of the events without metadata is decreasing.

This is expected as Falco builds up internal state to serve you all the information (see the source code https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/parsers.cpp). If we drop too many events kernel side, the state engine is not working. Perhaps the adaptive syscalls blog post (https://falco.org/blog/adaptive-syscalls-selection/) can provide more insights, and the base_syscalls feature may be of interest to you in general.

A bufSizePreset specific debug message (with a logic which can measure the buffer utilisation) would be very useful

Have you explored the internal automatic drop alerts or Falco metrics https://falco.org/docs/metrics/falco-metrics/ as alternative? Both expose drop counters from which you can infer how the buffer is holding up.


Some more general info:

Btw, in your example log is shows "container.name":"host"
so all container fields are expected to be null, see https://falco.org/docs/reference/rules/supported-fields/#field-class-container etc

re user names and group names, is the host /etc dir mounted and available? We have had issues in the past with minikube support in general as some mounts or setup is not like on actual Kubernetes. Perhaps some of it is also because of that.
How do you use minikube? Which driver? See also https://falco.org/docs/install-operate/third-party/learning/

@imreczegledi-form3
Copy link
Author

Thanks, I will check the blog post regarding adaptive syscalls.


driver: modern-bpf

I think it isn't a minikube compatibility issue because as you can see in the table above. Majority of the events are perfectly enriched like:

{"hostname":"minikube","output":"13:21:03.940424837: Warning Account Manipulation in SSH detected ...
 "output_fields": {"container.id":"host","container.image.repository":null,"container.image.tag":null,"container.name":"host","evt.res":"SUCCESS","evt.time":1718198463940424837,"evt.type":"openat","fd.name":"/home/ubuntu/my_sshd_config","group.gid":1001,"group.name":"<NA>","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"tail -f /home/ubuntu/my_sshd_config","proc.cwd":"","proc.exepath":"/usr/bin/tail","proc.pcmdline":"timeout 5s tail -f /home/ubuntu/my_sshd_config","proc.pid":12194,"proc.ppid":12192,"proc.sid":-1,"user.loginname":"docker","user.loginuid":1000,"user.name":"docker","user.uid":1000}}

so the root cause remains around the state engine/dropped events

@poiana
Copy link
Contributor

poiana commented Sep 15, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants