Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop and explicit tests to avoid duplicate ingest of elasticsearch logs #30440

Merged
merged 8 commits into from
Feb 21, 2022

Conversation

matschaffer
Copy link
Contributor

@matschaffer matschaffer commented Feb 17, 2022

What does this PR do?

Adds a "drop" to the elasticsearch pipeline as well as explicit "mixed" test logs to ensure we won't attempt to ingest logs across mismatched pipelines.

Why is it important?

Without it, the elasticsearch slowlog pipeline will attempt to ingest all 5 file sets shipped by the elasticsearch filebeat module.

Additionally the test cases help guard against removal of the drop processors.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

See https://www.elastic.co/guide/en/beats/devguide/master/filebeat-modules-devguide.html#_test for setup instructions. Specify TESTING_FILEBEAT_MODULES=elasticsearch to test the elasticsearch module directly.

Related issues

Fixes #30428
Related #30164
Related #16540

Use cases

In Elasticsearch log4j2.properties config is defined that server, deprecation, slowlog and audit logs are written to Console:

grep -R "Console" log4j2.properties
appender.rolling.type = Console
appender.deprecation_rolling.type = Console
appender.index_search_slowlog_rolling.type = Console
appender.index_indexing_slowlog_rolling.type = Console
appender.audit_rolling.type = Console

On the kubernetes node, where elasticsearch container is running, all those logs will be in one file:

/var/log/containers/__elasticsearch-.log

Filebeat pod has the whole folder /var/log mounted and reads log files from /var/log/containers/.

The elasticsearch module has 5 filesets which will lead to reading the kubernetes log 5 times and shipping every message to each pipeline.

This works around the issue of duplicate log storage by dropping at the top of the pipeline.

This pipeline already contained a drop to avoid duplicate logging.
This was partially guarded against in testing due to the grok on `elasticsearch.slowlog` but probably better to explicitly drop and avoid duplicate logging.
test-audit-docker.log also contains a case but it was overlooked in the expected file until elastic#30164 added the appropriate drop statements.
@matschaffer matschaffer added Filebeat Filebeat Team:Integrations Label for the Integrations team Feature:Stack Monitoring backport-v8.0.0 Automated backport with mergify backport-v8.3.0 Automated backport with mergify backport-v8.1.0 Automated backport with mergify backport-v8.2.0 Automated backport with mergify labels Feb 17, 2022
@matschaffer matschaffer self-assigned this Feb 17, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring (Stack monitoring)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 17, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 17, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-21T01:30:45.265+0000

  • Duration: 99 min 59 sec

Test stats 🧪

Test Results
Failed 0
Passed 8673
Skipped 1122
Total 9795

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@matschaffer
Copy link
Contributor Author

/test

@matschaffer
Copy link
Contributor Author

The docs failure seems to be unrelated:

14:33:31 INFO:build_docs:To github.com:elastic/built-docs
14:33:31 INFO:build_docs: ! [rejected]                  beats_30440 -> beats_30440 (fetch first)

If the above test doesn't fix it, I'll merge main on Monday.

@dedemorton
Copy link
Contributor

@elasticmachine run elasticsearch-ci/docs

Copy link
Contributor

@klacabane klacabane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice set of tests :)

Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding tests!
I've tried those pipeline adjustment as well as audit pipeline changes from #30164 on k8s environment - all works, no duplication

@mergify
Copy link
Contributor

mergify bot commented Feb 17, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 30428-drop-non-matching-es-logs upstream/30428-drop-non-matching-es-logs
git merge upstream/main
git push upstream 30428-drop-non-matching-es-logs

@matschaffer matschaffer merged commit 7b67384 into elastic:main Feb 21, 2022
mergify bot pushed a commit that referenced this pull request Feb 21, 2022
…h logs (#30440)

* Ensure we drop server logs that show up in deprecation pipeline

* Add note about deprecation dataset normalization

* Add test for mixed es server logs

This pipeline already contained a drop to avoid duplicate logging.

* Ensure we drop server logs that show up in slowlog pipeline

This was partially guarded against in testing due to the grok on `elasticsearch.slowlog` but probably better to explicitly drop and avoid duplicate logging.

* Add "mixed" test for elasticsearch audit logs

test-audit-docker.log also contains a case but it was overlooked in the expected file until #30164 added the appropriate drop statements.

* Changelog entry

* Remove duplicatd filebeat header

(cherry picked from commit 7b67384)
mergify bot pushed a commit that referenced this pull request Feb 21, 2022
…h logs (#30440)

* Ensure we drop server logs that show up in deprecation pipeline

* Add note about deprecation dataset normalization

* Add test for mixed es server logs

This pipeline already contained a drop to avoid duplicate logging.

* Ensure we drop server logs that show up in slowlog pipeline

This was partially guarded against in testing due to the grok on `elasticsearch.slowlog` but probably better to explicitly drop and avoid duplicate logging.

* Add "mixed" test for elasticsearch audit logs

test-audit-docker.log also contains a case but it was overlooked in the expected file until #30164 added the appropriate drop statements.

* Changelog entry

* Remove duplicatd filebeat header

(cherry picked from commit 7b67384)
@matschaffer matschaffer deleted the 30428-drop-non-matching-es-logs branch February 21, 2022 04:18
v1v added a commit to v1v/beats that referenced this pull request Feb 21, 2022
…nd-k8s-env

* upstream/main:
  fix typos and improve sentences (elastic#30432)
  Add drop and explicit tests to avoid duplicate ingest of elasticsearch logs (elastic#30440)
  {,x-pack/}auditbeat: replace uses of github.com/pkg/errors with stdlib equivalents (elastic#30321)
  Spelling fix (elastic#30439)
  packetbeat/beater: make sure Npcap installation runs before interfaces are needed in all cases (elastic#30438)
  Add BC about Homebrew no longer being available in 8.0 (elastic#30419)
  Install gawk as a replacement for mawk in Docker containers. (elastic#30452)
  Clean up python-related system tests (elastic#30415)
  Fix TestNewModuleRegistry flakiness (elastic#30453)
  [Filebeat] [auditd]: Support EXECVE events with truncated argument list (elastic#30382)
  Set `log.offset` to the start of the reported line in filestream (elastic#30445)
  clarify SelectedPackageTypes meaning and improve its usage (elastic#30142)
  [elasticsearch module] serialize shards properties (elastic#30408)
  Add docs about hints and templates autodiscovery priority (elastic#30343)
matschaffer added a commit that referenced this pull request Feb 21, 2022
… ingest of elasticsearch logs (#30487)

Co-authored-by: Mat Schaffer <[email protected]>
matschaffer added a commit that referenced this pull request Feb 22, 2022
…h logs (#30440) (#30488)

* Ensure we drop server logs that show up in deprecation pipeline

* Add note about deprecation dataset normalization

* Add test for mixed es server logs

This pipeline already contained a drop to avoid duplicate logging.

* Ensure we drop server logs that show up in slowlog pipeline

This was partially guarded against in testing due to the grok on `elasticsearch.slowlog` but probably better to explicitly drop and avoid duplicate logging.

* Add "mixed" test for elasticsearch audit logs

test-audit-docker.log also contains a case but it was overlooked in the expected file until #30164 added the appropriate drop statements.

* Changelog entry

* Remove duplicatd filebeat header

(cherry picked from commit 7b67384)

Co-authored-by: Mat Schaffer <[email protected]>
v1v added a commit to v1v/beats that referenced this pull request Feb 22, 2022
…ckaging-docker

* upstream/main: (26 commits)
  Update docker/distribution to 2.8.0 (elastic#30462)
  Add `parsers` examples to `filestream` reference configuration (elastic#30529)
  extend documentation about setting orchestrator.cluster fields (elastic#30518)
  Forward-port 8.0.1 changelog to main (elastic#30522)
  Switch skip to use `CI` (elastic#30512)
  packetbeat/beater: don't attempt to install npcap when already installed (elastic#30509)
  Fix Docker module: rename fields on dashboards (elastic#30500)
  fix typos and improve sentences (elastic#30432)
  Add drop and explicit tests to avoid duplicate ingest of elasticsearch logs (elastic#30440)
  {,x-pack/}auditbeat: replace uses of github.com/pkg/errors with stdlib equivalents (elastic#30321)
  Spelling fix (elastic#30439)
  packetbeat/beater: make sure Npcap installation runs before interfaces are needed in all cases (elastic#30438)
  Add BC about Homebrew no longer being available in 8.0 (elastic#30419)
  Install gawk as a replacement for mawk in Docker containers. (elastic#30452)
  Clean up python-related system tests (elastic#30415)
  Fix TestNewModuleRegistry flakiness (elastic#30453)
  [Filebeat] [auditd]: Support EXECVE events with truncated argument list (elastic#30382)
  Set `log.offset` to the start of the reported line in filestream (elastic#30445)
  clarify SelectedPackageTypes meaning and improve its usage (elastic#30142)
  [elasticsearch module] serialize shards properties (elastic#30408)
  ...
v1v added a commit that referenced this pull request Mar 2, 2022
…-29710

* '8.1' of github.com:elastic/beats: (51 commits)
  refactor pushDockerImages (#30414) (#30624)
  ci: add windows-2022 in the extended meta-stage (#30528) (#30630)
  Curate k8s testing versions to only keep the actively maintained (#30619) (#30625)
  [8.1](backport #30355) Add Beats upgrade docs for 8.0 (#30612)
  Remove references to gcp from the Functionbeat docs (#30579) (#30609)
  x-pack/auditbeat/module/system/socket: defend against exec with zero arguments (#30586) (#30597)
  [MySQL Enterprise] Adding default paths values to manifest.yml (#30598) (#30604)
  metricbeat - fix elasticsearch and kibana integration tests failures in 8.0 (#30566) (#30594)
  Install gawk as a replacement for mawk in Docker containers. (#30452) (#30465)
  [Filebeat] Remove RecordedFuture dataset from Threat Intel module (#30564) (#30568)
  Adjust the documentation of `backoff` options in filestream input (#30552) (#30557)
  packetbeat/beater: help the GC clean up the Npcap installer if it's not used (#30513) (#30546)
  Osquerybeat: Add install verification for osquerybeat (#30388) (#30404)
  Update docker/distribution to 2.8.0 (#30462) (#30540)
  Add `parsers` examples to `filestream` reference configuration (#30529) (#30537)
  [8.1](backport #30068) ZooKeeper module: Adapt to ZooKeeper 3.6+ `mntr` response fields' changes. (#30360)
  [8.1](backport #30512) Switch skip to use `CI` (#30525)
  Forward-port 8.0.1 changelog to 8.1 (#30517)
  packetbeat/beater: don't attempt to install npcap when already installed (#30509) (#30511)
  Add drop and explicit tests to avoid duplicate ingest of elasticsearch logs (#30440) (#30488)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.0.0 Automated backport with mergify backport-v8.1.0 Automated backport with mergify backport-v8.2.0 Automated backport with mergify backport-v8.3.0 Automated backport with mergify Feature:Stack Monitoring Filebeat Filebeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop non-matching logs inside elasticsearch filebeat module 8.0 pipelines
5 participants