-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documenting metrics messages in Beats logs #36163
Merged
Merged
Changes from 2 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
ad6112c
Documenting metrics messages in Beats logs
ycombinator 252e92b
Add include statements & other structure
kilfoyle 64cb9d2
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator 054ab3a
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator a47b0b5
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator 19002c9
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator f21bf3f
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator eca83f6
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator 20a2e7b
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator 26f328a
Update libbeat/docs/metrics-in-logs.asciidoc
ycombinator 11ff8ac
Review feedback
ycombinator f7fc24d
Review feedback
ycombinator f2dcf63
Format JSON
ycombinator f52b188
Replace occurrences of Beat with variable
ycombinator c5b265b
Add link to unstructured logs' parsing script
ycombinator 4e80569
Clarifying pipeline size metrics
ycombinator File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
|
||
|
||
Every 30 seconds (by default), Beats collect a _snapshot_ of metrics about itself. From this snapshot, Beat computes a _delta snapshot_; this delta snapshot contains any metrics that have _changed_ since the last snapshot. Note that the values of the metrics are the values when the snapshot is taken, **NOT** the _difference_ in values from the last snapshot. | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If this delta snapshot contains _any_ metrics (indicating that at least one metric changed since the last snapshot), this delta snapshot is serialized as JSON and emitted in the Beat's logs at the `INFO` log level. Here is an example of such a log entry: | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[source,json] | ||
---- | ||
{"log.level":"info","@timestamp":"2023-07-14T12:50:36.811Z","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":187},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":0}}}},"cpu":{"system":{"ticks":692690,"time":{"ms":60}},"total":{"ticks":3167250,"time":{"ms":150},"value":3167250},"user":{"ticks":2474560,"time":{"ms":90}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":32},"info":{"ephemeral_id":"2bab8688-34c0-4522-80af-db86948d547d","uptime":{"ms":617670096},"version":"8.6.2"},"memstats":{"gc_next":57189272,"memory_alloc":43589824,"memory_total":275281335792,"rss":183574528},"runtime":{"goroutines":212}},"filebeat":{"events":{"active":5,"added":52,"done":49},"harvester":{"open_files":6,"running":6,"started":1}},"libbeat":{"config":{"module":{"running":15}},"output":{"events":{"acked":48,"active":0,"batches":6,"total":48},"read":{"bytes":210},"write":{"bytes":26923}},"pipeline":{"clients":15,"events":{"active":5,"filtered":1,"published":51,"total":52},"queue":{"acked":48}}},"registrar":{"states":{"current":14,"update":49},"writes":{"success":6,"total":6}},"system":{"load":{"1":0.91,"15":0.37,"5":0.4,"norm":{"1":0.1138,"15":0.0463,"5":0.05}}}},"ecs.version":"1.6.0"}} | ||
---- | ||
|
||
[discrete] | ||
== Details | ||
|
||
Focussing on the `.monitoring.metrics` field, it's value is: | ||
|
||
[source,json] | ||
---- | ||
{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":0}}}},"cpu":{"system":{"ticks":692690,"time":{"ms":60}},"total":{"ticks":3167250,"time":{"ms":150},"value":3167250},"user":{"ticks":2474560,"time":{"ms":90}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":32},"info":{"ephemeral_id":"2bab8688-34c0-4522-80af-db86948d547d","uptime":{"ms":617670096},"version":"8.6.2"},"memstats":{"gc_next":57189272,"memory_alloc":43589824,"memory_total":275281335792,"rss":183574528},"runtime":{"goroutines":212}},"filebeat":{"events":{"active":5,"added":52,"done":49},"harvester":{"open_files":6,"running":6,"started":1}},"libbeat":{"config":{"module":{"running":15}},"output":{"events":{"acked":48,"active":0,"batches":6,"total":48},"read":{"bytes":210},"write":{"bytes":26923}},"pipeline":{"clients":15,"events":{"active":5,"filtered":1,"published":51,"total":52},"queue":{"acked":48}}},"registrar":{"states":{"current":14,"update":49},"writes":{"success":6,"total":6}},"system":{"load":{"1":0.91,"15":0.37,"5":0.4,"norm":{"1":0.1138,"15":0.0463,"5":0.05}}}} | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
---- | ||
|
||
The following tables attempt to explain the meaning of salient fields under `.monitoring.metrics` and also provide hints that might be helpful in troubleshooting Beats issues. | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[cols="1,1,2,2"] | ||
|=== | ||
| Field path (relative to `.monitoring.metrics`) | Type | Meaning | Troubleshooting hints | ||
|
||
| `.beat` | Object | Information that is common to all Beats, e.g. version, goroutines, file handles, CPU, memory | | ||
| `.libbeat` | Object | Information about the publisher pipeline and output, also common to all Beats | | ||
ifeval::["{beatname_lc}"=="filebeat"] | ||
| `.filebeat` | Object | Information specific to Filebeat, e.g. harvester, events | | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
endif::[] | ||
|=== | ||
|
||
[cols="1,1,2,2"] | ||
|=== | ||
| Field path (relative to `.monitoring.metrics.beat`) | Type | Meaning | Troubleshooting hints | ||
|
||
| `.runtime.goroutines` | Integer | Number of goroutines running | If this number grows over time, it indicates a goroutine leak | ||
|=== | ||
|
||
[cols="1,1,2,2"] | ||
|=== | ||
| Field path (relative to `.monitoring.metrics.libbeat`) | Type | Meaning | Troubleshooting hints | ||
|
||
| `.pipeline.events.active` | Integer | Number of events currently in the libbeat publisher pipeline. | If this number grows over time, it may indicate that the Beat (e.g. Filebeat) is producing events faster than the output can consume. Consider increasing the number of output workers (if this setting is supported by the output; Elasticsearch and Logstash outputs support this setting). If this number reaches the maximum queue size (`queue.mem.events` for the in-memory queue), it may indicate backpressure on the Beat, implying that the Beat may need to stop ingesting more events from the source. | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
cmacknz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `.output.events.total` | Integer | Number of events currently being processed by the output. | If this number grows over time, it may indicate that the output destination (e.g. Logstash pipeline or Elasticsearch cluster) is not being able to keep up with accepting events at the same or faster rate than what the Beat is sending to it. | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| `.output.events.acked` | Integer | Number of events acknowledged by the output destination. | Generally, we want this number to be the same as `.output.events.total` as this indicates that the output destination has reliably received all the events sent to it. | ||
| `.output.events.failed` | Integer | Number of events that the Beat tried to send to the output destination but it failed to receive them. | Generally, we want this field to be absent or its value to be 0. When the value is greater than zero, it's useful to check the Beat's logs right before this log entry's `@timestamp` to see if there are any connectivity issues with the output destination. Note that failed events are not lost or dropped; they will be sent back to the publisher pipeline for retrying later. | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|=== | ||
|
||
ifeval::["{beatname_lc}"=="filebeat"] | ||
[cols="1,1,2,2"] | ||
|=== | ||
| Field path (relative to `.monitoring.metrics.filebeat`) | Type | Meaning | Troubleshooting hints | ||
|
||
| `.events.active` | Integer | Number of events being actively processed by Filebeat (including events Filebeat has already sent to the libbeat publisher pipeline, but not including events the pipeline has sent to the output). | If this number grows over time, it may indicate that Filebeat inputs are harvesting events too fast for the pipeline and output to keep up. | ||
ycombinator marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|=== | ||
endif::[] | ||
|
||
ifeval::["{beatname_lc}"=="filebeat"] | ||
[discrete] | ||
== Useful commands | ||
|
||
[discrete] | ||
=== Check if Filebeat is processing events | ||
|
||
[source] | ||
---- | ||
$ cat beat.log | jq -r '[.["@timestamp"],.monitoring.metrics.filebeat.events.active,.monitoring.metrics.libbeat.pipeline.events.active,.monitoring.metrics.libbeat.output.events.total,.monitoring.metrics.libbeat.output.events.acked,.monitoring.metrics.libbeat.output.events.failed//0] | @tsv' | sort | ||
---- | ||
|
||
Example output: | ||
|
||
[source] | ||
---- | ||
2023-07-14T11:24:36.811Z 1 1 38033 38033 0 | ||
2023-07-14T11:25:06.811Z 1 1 17 17 0 | ||
2023-07-14T11:25:36.812Z 1 1 16 16 0 | ||
2023-07-14T11:26:06.811Z 1 1 17 17 0 | ||
2023-07-14T11:26:36.811Z 2 2 21 21 0 | ||
2023-07-14T11:27:06.812Z 1 1 18 18 0 | ||
2023-07-14T11:27:36.811Z 1 1 17 17 0 | ||
2023-07-14T11:28:06.811Z 1 1 18 18 0 | ||
2023-07-14T11:28:36.811Z 1 1 16 16 0 | ||
2023-07-14T11:37:06.811Z 1 1 270 270 0 | ||
2023-07-14T11:37:36.811Z 1 1 16 16 0 | ||
2023-07-14T11:38:06.811Z 1 1 17 17 0 | ||
2023-07-14T11:38:36.811Z 1 1 16 16 0 | ||
2023-07-14T11:41:36.811Z 3 3 323 323 0 | ||
2023-07-14T11:42:06.811Z 3 3 17 17 0 | ||
2023-07-14T11:42:36.812Z 4 4 18 18 0 | ||
2023-07-14T11:43:06.811Z 4 4 17 17 0 | ||
2023-07-14T11:43:36.811Z 2 2 17 17 0 | ||
2023-07-14T11:47:06.811Z 0 0 117 117 0 | ||
2023-07-14T11:47:36.811Z 2 2 14 14 0 | ||
2023-07-14T11:48:06.811Z 3 3 17 17 0 | ||
2023-07-14T11:48:36.811Z 2 2 17 17 0 | ||
2023-07-14T12:49:36.811Z 3 3 2008 1960 48 | ||
2023-07-14T12:50:06.812Z 2 2 18 18 0 | ||
2023-07-14T12:50:36.811Z 5 5 48 48 0 | ||
---- | ||
|
||
The columns here are: | ||
|
||
1. `.@timestamp` | ||
2. `.monitoring.metrics.filebeat.events.active` | ||
3. `.monitoring.metrics.libbeat.pipeline.events.active` | ||
4. `.monitoring.metrics.libbeat.output.events.total` | ||
5. `.monitoring.metrics.libbeat.output.events.acked` | ||
6. `.monitoring.metrics.libbeat.output.events.failed` | ||
endif::[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: what does this line do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't do anything, and rather it's just a leftover from when we had to identified X-pack content. I left it in when I copied the content over from the other docs page just because I'm not 100% sure it's not used anymore.