-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auditbeat] Fix up socket dataset runaway CPU usage #19764
[Auditbeat] Fix up socket dataset runaway CPU usage #19764
Conversation
Pinging @elastic/siem (Team:SIEM) |
Great investigation @andrewstucki, thanks for fixing this! It explains the profiles shared in the discuss thread: With onSockDestroyed() taking most of the CPU and only doing a map lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, needs a changelog entry
Just added on to your previous changelog entry and rebased from master to hopefully fix the weird CI mage issues |
I've shared a custom 7.8.1 build with this fix on the discuss thread. |
* Fix up socket dataset * Add Changelog entry (cherry picked from commit cb4cedc)
* Fix up socket dataset * Add Changelog entry (cherry picked from commit cb4cedc)
* Fix up socket dataset * Add Changelog entry (cherry picked from commit cb4cedc)
* upstream/master: Add `docker logs` support to the Elastic Log Driver (elastic#19531) [Elastic Agent] Fix saving of agent configuration on Windows to have proper ACLs (elastic#19793) Send the config revision down to the endpoint application. (elastic#19759) [Elastic Agent] Add support for multiple hosts in connection to kibana (elastic#19628) Remove the downloadConfig and retryConfig from plugin/process.Application and plugin/service.Application. (elastic#19603) Update go version to 1.14.4 (elastic#19753) ci: set builds as skipped when they do not match the trigger (elastic#19750) [Auditbeat] Fix up socket dataset runaway CPU usage (elastic#19764) Convert cloudfoundry input to v2 (elastic#19717)
* Fix up socket dataset * Add Changelog entry
…unaway CPU usage (elastic#19783) * [Auditbeat] Fix up socket dataset runaway CPU usage (elastic#19764) * Fix up socket dataset * Add Changelog entry (cherry picked from commit f1ef970) * fix up changelog * Fix changelog
…lastic#19781) * Fix up socket dataset * Add Changelog entry (cherry picked from commit f1ef970)
What does this PR do?
Fix for auditbeat runaway CPU usage: #19141
So, here's the explanation, basically everything was pretty much as described in the previous PR (#19033), the only additional things that I found were that:
*socket
is terminated by another socket with a different kerneltid
it's moved to theclosing
LRU list.*socket
is added to the statesocks
map with the ptr reference pointing to itonSockTerminated
is called againonSockTerminated
the socket is pruned again from thesocks
map with the call todelete(s.socks, sock.sock)
socks
map now refers to the new*socket
rather than the old one*socket
times outonSockDestroyed
is called on it with the code that's doing the peek on thesocketLRU
in the reaper codesocks
map in step 5onSockDestroyed
was running the following code:found
was returningfalse
and the function was returnings.socketLRU.peek()
the same socket was getting returned over and over, resulting in the reaper routine getting wedged in a tightfor
loop (hence the high CPU usage).The fix
Basically we pass a reference to the
*socket
object in the reaper'sonSockDestroyed
call, that way we don't have to look up the socket ins.socks
and, instead handle the socket closure directly.Related issues