-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libbeat/cmd/instance: report cgroup stats #21113
Conversation
@simitt I seem to recall you had something more elaborate in mind. Could you please take a look and let me know if you think more is needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! LGTM
💔 Tests FailedExpand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
Pinging @elastic/integrations (Team:Integrations) |
@ycombinator would you mind taking a quick look at this? Or else please suggest someone else to take a look. We'd like to get this in for 7.10, so we can get the Stack Monitoring UI updated to accurately reflect server performance when running in ESS/ECE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
[EDIT] Jenkins CI failures are consistent with those on the latest master
build and, therefore, unrelated to this PR.
* libbeat/cmd/instance: report cgroup stats (cherry picked from commit b4c7a93)
…peline-2.0 * upstream/master: libbeat/cmd/instance: report cgroup stats (elastic#21113) Configurable index template loading (elastic#21212) [Ingest Manager] Thread safe sorted set (elastic#21290)
* upstream/master: (417 commits) libbeat/cmd/instance: report cgroup stats (elastic#21113) Configurable index template loading (elastic#21212) [Ingest Manager] Thread safe sorted set (elastic#21290) Change mirror of kafka download (elastic#19645) [Ingest manager] Copy Action store on upgrade (elastic#21298) [CI] Pipeline 2.0 for monorepos (elastic#20104) Stop running agent container as root by default (elastic#21213) Stop running auditbeat container as root by default (elastic#21202) Fix autodiscover flaky tests (elastic#21242) [Ingest Manager] Enabled dev builds (elastic#21241) Fix librpm installation in auditbeat build (elastic#21239) Fix prometheus default config (elastic#21253) Fix dev guide test command (elastic#21254) Move aws lambda metricset to GA (elastic#21255) [Docs] Typo in table syntax (elastic#20227) [ECS] Adds related.hosts to capture all hostnames and host identifiers on an event. (elastic#21160) Add recursive split to httpjson (elastic#21214) [DOCS] Add beat specific start widgets (elastic#21217) Fix timestamp handling in remote_write (elastic#21166) Fix aws, azure and googlecloud compute dashboards (elastic#21098) ...
* upstream/master: (399 commits) libbeat/cmd/instance: report cgroup stats (elastic#21113) Configurable index template loading (elastic#21212) [Ingest Manager] Thread safe sorted set (elastic#21290) Change mirror of kafka download (elastic#19645) [Ingest manager] Copy Action store on upgrade (elastic#21298) [CI] Pipeline 2.0 for monorepos (elastic#20104) Stop running agent container as root by default (elastic#21213) Stop running auditbeat container as root by default (elastic#21202) Fix autodiscover flaky tests (elastic#21242) [Ingest Manager] Enabled dev builds (elastic#21241) Fix librpm installation in auditbeat build (elastic#21239) Fix prometheus default config (elastic#21253) Fix dev guide test command (elastic#21254) Move aws lambda metricset to GA (elastic#21255) [Docs] Typo in table syntax (elastic#20227) [ECS] Adds related.hosts to capture all hostnames and host identifiers on an event. (elastic#21160) Add recursive split to httpjson (elastic#21214) [DOCS] Add beat specific start widgets (elastic#21217) Fix timestamp handling in remote_write (elastic#21166) Fix aws, azure and googlecloud compute dashboards (elastic#21098) ...
* upstream/master: (60 commits) libbeat/cmd/instance: report cgroup stats (elastic#21113) Configurable index template loading (elastic#21212) [Ingest Manager] Thread safe sorted set (elastic#21290) Change mirror of kafka download (elastic#19645) [Ingest manager] Copy Action store on upgrade (elastic#21298) [CI] Pipeline 2.0 for monorepos (elastic#20104) Stop running agent container as root by default (elastic#21213) Stop running auditbeat container as root by default (elastic#21202) Fix autodiscover flaky tests (elastic#21242) [Ingest Manager] Enabled dev builds (elastic#21241) Fix librpm installation in auditbeat build (elastic#21239) Fix prometheus default config (elastic#21253) Fix dev guide test command (elastic#21254) Move aws lambda metricset to GA (elastic#21255) [Docs] Typo in table syntax (elastic#20227) [ECS] Adds related.hosts to capture all hostnames and host identifiers on an event. (elastic#21160) Add recursive split to httpjson (elastic#21214) [DOCS] Add beat specific start widgets (elastic#21217) Fix timestamp handling in remote_write (elastic#21166) Fix aws, azure and googlecloud compute dashboards (elastic#21098) ...
Gosigar's cgroups GetStatsForProcesses can return a nil Stats pointer and no error when the ["blkio", "cpu", "cpuacct", "memory"] subsystems are on the root cgroup. Related elastic#21113
Gosigar's cgroups GetStatsForProcesses can return a nil Stats pointer and no error when the ["blkio", "cpu", "cpuacct", "memory"] subsystems are on the root cgroup. Related elastic#21113
Gosigar's cgroups GetStatsForProcesses can return a nil Stats pointer and no error when the ["blkio", "cpu", "cpuacct", "memory"] subsystems are on the root cgroup. Related #21113
…21334) * libbeat/cmd/instance: report cgroup stats (#21113) * libbeat/cmd/instance: report cgroup stats (cherry picked from commit b4c7a93) * Fix panic in cgroups monitoring Gosigar's cgroups GetStatsForProcesses can return a nil Stats pointer and no error when the ["blkio", "cpu", "cpuacct", "memory"] subsystems are on the root cgroup. Related #21113 Co-authored-by: Adrian Serrano <[email protected]>
* Update to elastic/beats@ef6274d0d1e3 Brings in elastic/beats#21113 * tests/system: add geo.country_name to approvals elastic/elasticsearch#61523 * Update docs * systemtest: another country_name approval
* Update to elastic/beats@ef6274d0d1e3 Brings in elastic/beats#21113 * tests/system: add geo.country_name to approvals elastic/elasticsearch#61523 * Update docs * systemtest: another country_name approval
* Update to elastic/beats@ef6274d0d1e3 Brings in elastic/beats#21113 * tests/system: add geo.country_name to approvals elastic/elasticsearch#61523 * Update docs * systemtest: another country_name approval
Add field mappings for the cgroups metrics added to beats monitoring in elastic/beats#21113, and required by elastic/kibana#79050.
Add field mappings for the cgroups metrics added to beats monitoring in elastic/beats#21113, and required by elastic/kibana#79050.
} | ||
selfStats, err := cgroups.GetStatsForProcess(pid) | ||
if err != nil { | ||
logp.Err("error getting group status: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that be "error getting cgroup stats"? The error message puzzled me initially..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed :)
Thanks for pointing that out, will be fixed by #23413
* Update to elastic/beats@ef6274d0d1e3 Brings in elastic/beats#21113 * tests/system: add geo.country_name to approvals elastic/elasticsearch#61523 * Update docs * systemtest: another country_name approval
Add field mappings for the cgroups metrics added to beats monitoring in elastic/beats#21113, and required by elastic/kibana#79050.
Add field mappings for the cgroups metrics added to beats monitoring in elastic/beats#21113, and required by elastic/kibana#79050.
What does this PR do?
Report cgroup limits/stats on Linux, similar to what Elasticsearch reports through node stats: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html
Metric names are based on (but not exactly the same as) the
system.process.cgroup.*
fields.Why is it important?
This is important for reporting accurate resource usage in containerised environments.
Checklist
- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration files- [ ] I have added tests that prove my fix is effective or that my feature works- [ ] I have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.I couldn't see docs or tests to update - please point me to them if there are any.
How to test this PR locally
Related issues
Closes #14691