Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a ring-buffer reporter to libbeat #28750

Merged
merged 6 commits into from
Feb 15, 2022

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Nov 1, 2021

What does this PR do?

Add a ring buffer reporter that when enabled will store configured
namespaces in a buffer to allow operators to view recent metrics
history. Defaults are to gather the stats namespace every 10s for 10m.
Must be explicitly enabled along with the HTTP endpoints.
The buffer endpoint is intended to be used for diagnostics reporting.

Why is it important?

We wish to gather recent metrics data with the elastic-agent diagnostics command.
The buffer endpoint can be used to collect such data.

Also once this is implemented we may more easily disable the log metrics reporting functionality.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

compile a beat, and run with the following options enabled in the config:

http.enabled: true
http.buffer.enabled: true

Related issues

Note that this functionality will not be backported, the number of cases metrics could help debug are minimal.

@michel-laterman michel-laterman added enhancement Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Nov 1, 2021
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 1, 2021
@mergify
Copy link
Contributor

mergify bot commented Nov 1, 2021

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 1, 2021
@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Nov 1, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 1, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-11T22:44:39.843+0000

  • Duration: 17 min 40 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Nov 15, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b monitoring-ring-buffer upstream/monitoring-ring-buffer
git merge upstream/master
git push upstream monitoring-ring-buffer

Add a ring buffer reporter that when enabled will store configured
namespaces in a buffer to allow operators to view recent metrics
history. Defaults are to gather the stats namespace every 10s for 10m.
Must be explicitly enabled, along with monitoring, and the HTTP
endpoint. The buffer endpoint is intended to be used for diagnostics
reporting.
@botelastic
Copy link

botelastic bot commented Dec 20, 2021

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Dec 20, 2021
@botelastic botelastic bot removed the Stalled label Dec 21, 2021
@botelastic
Copy link

botelastic bot commented Jan 20, 2022

Hi!
We just realized that we haven't looked into this PR in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it in as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jan 20, 2022
@ph
Copy link
Contributor

ph commented Jan 20, 2022

This is interesting @michel-laterman let me know if it's ready to look at it.

@botelastic botelastic bot removed the Stalled label Jan 20, 2022
@michel-laterman
Copy link
Contributor Author

michel-laterman commented Jan 31, 2022

Finally got around to running memory/cpu usage tests:
Running metricbeat 8.1-SNAPSHOT on ubuntu 18.04 (left) and 20.04 (right).

When http + buffer is enabled on 20.04 we see:
buffered-view
the mb running with the buffer has an rss size of 162-163MB (same as the 18.04 instance without enabling the buffer).
The memory usage on 18.04 generally should be disregarded as I believe the OS difference injects too much noise into the result.

Just enabling the HTTP endpoint on 20.04 we can see the rss size drop to 159-160MB (share average also drops by around 1MB):
http-no-buffer-view

And disabling both the buffer and endpoint leads to an rss size of around 159MB:
no-http-view

CPU usage does not meaningfully change with the buffer enabled.
With default settings enabling the buffer does not lead to any major changes in memory usage. however if more namespaces are added to the buffer, or if extra process information is collected (through the system module), the numbers can increase

@michel-laterman michel-laterman added backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.2.0 Automated backport with mergify labels Feb 8, 2022
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Feb 8, 2022
@michel-laterman michel-laterman removed the backport-7.17 Automated backport to the 7.17 branch with mergify label Feb 8, 2022
@michel-laterman michel-laterman marked this pull request as ready for review February 8, 2022 17:23
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really good.

It has great test coverage and it's easy to understand. I feel like I should have found something to comment on, but its really good, nothing I see!

@ph
Copy link
Contributor

ph commented Feb 9, 2022

Going to restart the tests, it been a few months the PR was created.

/test

@ph
Copy link
Contributor

ph commented Feb 9, 2022

/test

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me, adding only a small nit comment.


// ringBuffer is a buffer with a fixed number of items that can be tracked.
//
// we assume that the size of the buffer is greater than one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// we assume that the size of the buffer is greater than one.
// We assume that the size of the buffer is greater than one.
``

func (r *reporter) Stop() {
close(r.done)
r.wg.Wait()
// Clear entries?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment so there is no confusion on the godoc of the function, no need to clear the ring buffer there.

Comment on lines 79 to 86
switch r.(type) {
case error:
err = r.(error)
case string:
err = fmt.Errorf(r.(string))
default:
err = fmt.Errorf("handle attempted to panic with %v", r)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch r.(type) {
case error:
err = r.(error)
case string:
err = fmt.Errorf(r.(string))
default:
err = fmt.Errorf("handle attempted to panic with %v", r)
}
switch r := r.(type) {
case error:
err = r
case string:
err = errors.New(r)
default:
err = fmt.Errorf("handle attempted to panic with %v", r)
}

Needs "errors" added to imports.

require.NoError(t, err)
defer r.Body.Close()

body, err := ioutil.ReadAll(r.Body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ioutil/io/

entries []interface{}
i int
full bool
mu sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally mutexes are placed above the field they protect.

@michel-laterman michel-laterman requested review from a team as code owners February 10, 2022 17:11
@botelastic botelastic bot added the Team:Automation Label for the Observability productivity team label Feb 10, 2022
@michel-laterman
Copy link
Contributor Author

/test

1 similar comment
@kuisathaverat
Copy link
Contributor

/test

@michel-laterman michel-laterman merged commit 6769d47 into elastic:main Feb 15, 2022
@michel-laterman michel-laterman deleted the monitoring-ring-buffer branch February 15, 2022 16:59
v1v added a commit to v1v/beats that referenced this pull request Feb 21, 2022
…into feature/use-with-kind-k8s-env

* 'feature/use-with-kind-k8s-env' of github.com:v1v/beats: (52 commits)
  ci: home is declared within withBeatsEnv
  ci: use withKindEnv step
  ci: use getBranchesFromAliases and support next-patch-8 (elastic#30400)
  Update fields.yml (elastic#29609)
  Heartbeat: fix browser metrics and trace mappings (elastic#30258)
  Apply light edits to 8.0 changelog (elastic#30351)
  packetbeat/beater: make sure Npcap installation runs before interfaces are needed (elastic#30396)
  Add a ring-buffer reporter to libbeat (elastic#28750)
  Osquerybeat: Add install verification for osquerybeat (elastic#30388)
  update windows matrix support (elastic#30373)
  Refactor of metricbeat process-gathering metrics and system/process (elastic#30076)
  adjust next changelog wording (elastic#30371)
  [Metricbeat] azure: move event report into loop validDim loop (elastic#29945)
  fix: report GitHub Check before the cache (elastic#30372)
  Add support for non-unique keys in Kafka output headers (elastic#30369)
  ci: 6 major branch reached EOL (elastic#30357)
  reduce Elastic Agent shut down time by stopping processes concurrently (elastic#29650)
  [Filebeat] Add message to register encode/decode debug logs (elastic#30271)
  [libbeat] kafka message header support (elastic#29940)
  Heartbeat: set duration to zero for syntax errors (elastic#30227)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.2.0 Automated backport with mergify enhancement Team:Automation Label for the Observability productivity team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants