Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #19764 to 7.8: [Auditbeat] Fix up socket dataset runaway CPU usage #19781

Merged
merged 1 commit into from
Jul 9, 2020

Conversation

andrewstucki
Copy link

@andrewstucki andrewstucki commented Jul 9, 2020

Cherry-pick of PR #19764 to 7.8 branch. Original message:

What does this PR do?

Fix for auditbeat runaway CPU usage: #19141

So, here's the explanation, basically everything was pretty much as described in the previous PR (#19033), the only additional things that I found were that:

  1. When a *socket is terminated by another socket with a different kernel tid it's moved to the closing LRU list.
  2. The new *socket is added to the state socks map with the ptr reference pointing to it
  3. The reaper comes along and hits the following code path:
	for item := s.closing.peek(); item != nil && item.Timestamp().Before(deadline); {
		if sock, ok := item.(*socket); ok {
			s.onSockTerminated(sock)
		} else {
			s.closing.get()
		}
		item = s.closing.peek()
	}
  1. The old "terminated" socket is now in a "closing" state, so onSockTerminated is called again
  2. In onSockTerminated the socket is pruned again from the socks map with the call to delete(s.socks, sock.sock)
  3. The problem is that the socks map now refers to the new *socket rather than the old one
  4. Eventually if the new *socket times out onSockDestroyed is called on it with the code that's doing the peek on the socketLRU in the reaper code
  5. That was taking a reference to the socket pointer that had been deleted from the socks map in step 5
  6. onSockDestroyed was running the following code:
	sock, found = s.socks[ptr]
	if !found {
		return nil
	}
  1. found was returning false and the function was returning
  2. Because of the call to s.socketLRU.peek() the same socket was getting returned over and over, resulting in the reaper routine getting wedged in a tight for loop (hence the high CPU usage).

The fix

Basically we pass a reference to the *socket object in the reaper's onSockDestroyed call, that way we don't have to look up the socket in s.socks and, instead handle the socket closure directly.

Related issues

* Fix up socket dataset
* Add Changelog entry

(cherry picked from commit cb4cedc)
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jul 9, 2020
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19781 opened]

  • Start Time: 2020-07-09T13:47:29.213+0000

  • Duration: 48 min 40 sec

Test stats 🧪

Test Results
Failed 0
Passed 230
Skipped 49
Total 279

@andrewstucki andrewstucki merged commit 479c9f0 into elastic:7.8 Jul 9, 2020
@andrewstucki andrewstucki deleted the backport_19764_7.8 branch July 9, 2020 15:05
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…lastic#19781)

* Fix up socket dataset
* Add Changelog entry

(cherry picked from commit f1ef970)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants