-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] Update handling of elasticsearch server logs #30018
Conversation
This pull request does not have a backport label. Could you fix it @matschaffer? 🙏
NOTE: |
I'm able to exercise the resulting pipeline via dev tools by passing this:
But the tests fail out after seeing no hits in the resulting index. I don't see how to find the filebeat logs which probably contain the info we need to find out why ingest is failing. |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
This is rather a drive-by comment without having any context, so I might say something stupid... In 8.0, Beats adopted data streams, so |
Yeah that’s a possibility. The message contains json which has a timestamp, but it’s unclear if the outer doc from filebeat has one. |
Filebeat always adds EDIT: example PR for fixing some issues: #29425 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change, can Elasticsearch 7 logs still be read?
Yeah I’m not sure there’s a 7.x test. Maybe good to add one. |
…a timestamp field
Requires for |
At this point the PR is intended to be complete and ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes required by the adoption of data stream look good to me.
Commenting to make sure we don't forget as a follow up. As during an upgrade of Elasticsearch 7.last to 8.0, Beats 7.last needs to be able to keep reading the logs. Because of this we need to discuss as a follow up what we do about 7.last. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not run the change locally to test it against stack monitoring but browsed through the changes. In what I see, in most cases only the values changes but not the key which is the critical part.
Also I like how simple the 8 pipelines have become which was the goal of the change in Elasticsearch. I left a few follow up comments that do NOT have to be adressed in this PR but later on.
{"@timestamp":"2020-04-16T13:47:36.309Z", "log.level": "WARN", "message":"index name [.apm-custom-link] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices" , "service.name":"ES_ECS","process.thread.name":"elasticsearch[n1][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.deprecation.cluster.metadata.MetadataCreateIndexService","type":"deprecation","cluster.uuid":"ZGYecRsDQPK_-ktRec3ZGQ","node.id":"Ni-9zbrZRm24wm7_zNtMTw","node.name":"n1","cluster.name":"es800"} | ||
{"@timestamp":"2020-04-16T13:55:56.365Z", "log.level": "WARN", "message":"index name [.monitoring-alerts-7] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices" , "service.name":"ES_ECS","process.thread.name":"elasticsearch[n1][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.deprecation.cluster.metadata.MetadataCreateIndexService","type":"deprecation","cluster.uuid":"ZGYecRsDQPK_-ktRec3ZGQ","node.id":"Ni-9zbrZRm24wm7_zNtMTw","node.name":"n1","cluster.name":"es800"} | ||
{"@timestamp":"2020-04-16T13:56:14.697Z", "log.level": "WARN", "message":"[types removal] Using the _type field in queries and aggregations is deprecated, prefer to use a field instead." , "service.name":"ES_ECS","process.thread.name":"elasticsearch[n1][search][T#7]","log.logger":"org.elasticsearch.deprecation.index.query.QueryShardContext","type":"deprecation","cluster.uuid":"ZGYecRsDQPK_-ktRec3ZGQ","node.id":"Ni-9zbrZRm24wm7_zNtMTw","node.name":"n1","cluster.name":"es800"} | ||
{"@timestamp":"2022-01-27T11:48:45.809Z", "log.level":"CRITICAL", "data_stream.dataset":"deprecation.elasticsearch","data_stream.namespace":"default","data_stream.type":"logs","elasticsearch.elastic_product_origin":"","elasticsearch.event.category":"compatible_api","elasticsearch.http.request.x_opaque_id":"v7app","event.code":"create_index_with_types","message":"[types removal] Using include_type_name in create index requests is deprecated. The parameter will be removed in the next major version." , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"deprecation.elasticsearch","process.thread.name":"elasticsearch[runTask-0][transport_worker][T#8]","log.logger":"org.elasticsearch.deprecation.rest.action.admin.indices.RestCreateIndexAction","trace.id":"0af7651916cd43dd8448eb211c80319c","elasticsearch.cluster.uuid":"5alW33KLT16Lp1SevDqDSQ","elasticsearch.node.id":"tVLnAGLgQum5ca6z50aqbw","elasticsearch.node.name":"runTask-0","elasticsearch.cluster.name":"runTask"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up we should try to find some longer log files to make sure different kind of events work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pgomulka could you help with a better log sample?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can generate more deprecations. But structurally they will be the same.
I guess what would be useful to have a "typical" WARN/CRITICAL deprecation message in additional to these two CRITICAL compatible category deprecations.
but as stated before, structurally they will be the same. Just different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two additional log samples for non-compatible-api deprecations. The difference is that the elasticsearch.event.category
is not compatible_api
which can likely be used by a UI to distinguish the type of deprecation.
Also one of the log samples has "log.level": "WARN",
which means that the deprecation will not be removed in next version (as opposed to CRITICAL)
updated gist with deprecation logs: https://gist.github.com/pgomulka/4c4842958dbb3812338ab81d2a4365ba
udpated gist with all commands to generate logs: https://gist.github.com/pgomulka/285b4daa6a48ac6c17207b98fa06337e
target_field: elasticsearch.slowlog.duration | ||
ignore_missing: true | ||
- grok: | ||
field: elasticsearch.slowlog.message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up, I'm wondering why ES in 8.0 not puts this out as JSON already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably also a question for @pgomulka to help out with. Looks like the message
field is basically structured, so would be nice to split it out into the json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ESv8 JSON slow log's message field only contains shardId.
Take a look at formatted log line
{
"@timestamp": "2022-01-27T11:42:31.395Z",
"log.level": "DEBUG",
"elasticsearch.slowlog.id": null,
"elasticsearch.slowlog.message": "[test_1][0]",
"elasticsearch.slowlog.search_type": "QUERY_THEN_FETCH",
"elasticsearch.slowlog.source": "{}",
"elasticsearch.slowlog.stats": "[]",
"elasticsearch.slowlog.took": "115.3micros",
"elasticsearch.slowlog.took_millis": 0,
"elasticsearch.slowlog.total_hits": "2 hits",
"elasticsearch.slowlog.total_shards": 1,
"ecs.version": "1.2.0",
"service.name": "ES_ECS",
"event.dataset": "elasticsearch.index_search_slowlog",
"process.thread.name": "elasticsearch[runTask-0][search][T#5]",
"log.logger": "index.search.slowlog.query",
"elasticsearch.cluster.uuid": "5alW33KLT16Lp1SevDqDSQ",
"elasticsearch.node.id": "tVLnAGLgQum5ca6z50aqbw",
"elasticsearch.node.name": "runTask-0",
"elasticsearch.cluster.name": "runTask"
}
I guess having that shardId in its own field would be nice. Is this what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the grok pattern used below, it seems there could be much more then just the shardId. But if it only contains the shard id, then the massive grok pattern below becomes obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be because the message field in plaintext logs (7.x and 6.x .log files) contained all emitted as structured plaintext. So effectively all fields into message field?
It looks to me that this pattern could parse it (not really needed because JSON logs will be emitted into .json file - not .log)
https://github.com/elastic/beats/blob/master/filebeat/module/elasticsearch/slowlog/test/test.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this does not exist anymore in 8.0 and as this is only the pipeline for 8.0 logs, seems like this could be cleaned up then. @matschaffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks ok to me, the python module integration test passed. Meaning that the Source matches the expected format in Elasticsearch. For the view of Filebat this seams to be correct, I haven't tested with a real Elasticsearch instance or Stack Monitoring.
Please make sure follow-up issues are created, for 7.last and the elasticsearch problem in the field content. https://github.com/elastic/beats/pull/30018/files#r794456519
- set: | ||
field: event.dataset | ||
value: elasticsearch.deprecation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klacabane Create a followup issue for this change.
Tested the Stack Monitoring UI with the following procedure:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've did a end to end test with the following PR.
I can confirm that the events are send into Elasticsearch correctly associated with their dataset.
Co-authored-by: Noémi Ványi <[email protected]> Co-authored-by: klacabane <[email protected]> (cherry picked from commit 27d44ce)
Thanks @klacabane for the work and everyone that contributed. @klacabane When I've tested this PR, I've thought about the elasticsearch integration. We need to make sure that the integration have the same features of this module. @andresrc |
@ph We'll verify the elasticsearch integration in elastic/kibana#119109 |
) Co-authored-by: Noémi Ványi <[email protected]> Co-authored-by: klacabane <[email protected]> (cherry picked from commit 27d44ce) Co-authored-by: Mat Schaffer <[email protected]>
@klacabane I would be good to see the testing with the integration prioritised. For 7.x I expect that it mostly worked as it was taken over from the module. But seeing the changes we had to make above I'm pretty convinced it stops working for 8.0 data. |
@matschaffer @klacabane The tests we have today for the logs in place, depend on us to update these logs for the tests which is one of the reasons we didn't spot this earlier. In addition to the tests we have, I would like to see us adding something around collecting ES logs to our e2e suite to collect logs from "actual" ES instances. Please have a conversation with @mdelapenya |
…k-version-after-8-0-creation * upstream/master: (69 commits) Update stale config following (elastic#30082) Make include_matches backwards compatible with 7.x config (elastic#30032) [Filebeat] Update handling of elasticsearch server logs (elastic#30018) Remove SSL3 support from libbeat and its documentation. (elastic#30071) Revert "Packaging: rename arm64 suffix to aarch64 in the tar.gz artifacts ONLY (elastic#28813)" (elastic#30083) [libbeat] Add script processor to all beats (elastic#29752) Add fonts to support more different types of characters for multiple languages (elastic#29861) libbeat/reader: Fix messge conversion to beat.Event (elastic#30057) probot[stale]: ignore issues with the tag flaky-test (elastic#30065) [DOCS] Add redirect for GSuite module (elastic#30034) [Automation] Update elastic stack version to 8.1.0-aa69d697 for testing (elastic#30012) Remove msitools install for windows build, using the latest docker image with msitools preinstalled (elastic#30040) filebeat/generator/fields: fix dropped error (elastic#29943) Include the error message with auditd module events (elastic#30009) [Metricbeat] gcp: add firestore metricset (elastic#29918) probot: update stale dates (elastic#29997) Metricbeat enterprise search module: add xpack.enabled support (elastic#29871) x-pack/packetbeat: install Npcap at start-up when required (elastic#29112) [Filebeat] Fix panic in decode_cef when recovering from invalid data (elastic#30038) Correctly fixe how selected packages are defined (elastic#30039) ...
@ruflin agreed. I'd like to propose we use our internal postmortem process to have a discussion about how we ended up in this situation and how we could prevent it from happening again. |
@matschaffer @ruflin Please let me know if you will have a discussion on how to improve on either ES or filebeat side. I will be happy to help. |
@pgomulka Glad you are bringing this up. I agree that the ES team should maintain it eventually. @jasonrhodes Can we follow up with the ES team on this in case you agree? |
Status: in progress
What does this PR do?
Fixes the elasticsearch module so that 8.0.0 ES json logs can be ingested.
server
,slowlog
anddeprecation
logs format did change in 8.0 and a dedicate pipeline was implemented for each log type.gc
kept the same format so we didn't have to update the ingestion pipeline for that. Not sure aboutaudit
yet.Why is it important?
The current pipeline is unable to process elasticsearch JSON logs from 8.0.0. They just get silently dropped.
Missing these logs means we can't show relevant ES logs as part of the stack monitoring UI.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Ensure you can run module tests locally. See #30014 for reference.
Run
INTEGRATION_TESTS=1 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=elasticsearch ES_PASS=changeme pytest tests/system/test_modules.py -v
to test just the elasticsearch module.Optionally:
curl -k -u elastic:changeme -H 'Content-Type: application/json' http://localhost:9200/_security/user/kibana_system/_password -d'{"password": "changeme"}'
) to launch and inspect the logs in kibanaRelated issues
Fixes #29880