Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Sort array fields in generated data #25320

Merged
merged 5 commits into from
May 13, 2021

Conversation

legoguy1000
Copy link
Contributor

What does this PR do?

Currently the array fields in generated data don't have any sorting or consistent ordering. This PR sorts any field that is a list/array when the generated data files are created.

Why is it important?

Whenever data files are generated, there are a large number of changes in the commit due to array fields changing order of the items even when the contents didn't actually change. This makes it very difficult to actually find what changed, if anything.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

cd beats/filebeat
GENERATE=true TESTING_FILEBEAT_MODULES=<any module> mage -v pythonIntegTest

The generated data above should always come out the same and the list orders shouldn't change

Related issues

Use cases

Screenshots

image

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 26, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 26, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #25320 updated

  • Start Time: 2021-05-13T00:09:56.302+0000

  • Duration: 137 min 51 sec

  • Commit: a018262

Test stats 🧪

Test Results
Failed 0
Passed 13713
Skipped 2285
Total 15998

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 13713
Skipped 2285
Total 15998

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 3, 2021
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this will make things more consistent.

@@ -194,6 +194,9 @@ def _test_expected_events(self, test_file, objects):
for k, obj in enumerate(objects):
objects[k] = self.flatten_object(obj, {}, "")
clean_keys(objects[k])
for key in objects[k].keys():
if isinstance(objects[k][key], list) and len(objects[k][key]) > 0 and not isinstance(objects[k][key][0], dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about the last check to see if the first element is a dict, if any of the items in the list can't be compared to one another with "<" we will get a runtime error. What about switching to:

if isinstance(objects[k][key], list):
    objects[k][key].sort(key=str)

That will force string comparison for everything.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never thought about forcing everything to a string. That's a really interesting idea. I'll try that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be working, but I'm running into an issue that I had before. When running the test against all the modules GENERATE=true mage -v pythonIntegTest, I end up with some errors below. Any thoughts?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things work fine when I do modules individually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. I'm not seeing those errors when I run your PR locally.

anything in the build/system-tests/run/<failed test name> directory? It looks like some documents got left in elasticsearch expected event.module=kafka but got redis.

Might need to clean out docker images and re-run the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll schwack them and start clean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still got the same error. This is what shows in the output.log file under build/system-tests/run/<failed test name>

2021-05-04T21:36:37.226-0200	INFO	instance/beat.go:651	Home path: [/go/src/github.com/elastic/beats/filebeat] Config path: [/go/src/github.com/elastic/beats/filebeat] Data path: [/go/src/github.com/elastic/beats/filebeat/data] Logs path: [/go/src/github.com/elastic/beats/filebeat/logs]
2021-05-04T21:36:37.237-0200	DEBUG	[beat]	instance/beat.go:704	Beat metadata path: /go/src/github.com/elastic/beats/filebeat/data/meta.json
2021-05-04T21:36:37.237-0200	INFO	instance/beat.go:659	Beat ID: 32ac6545-bf91-437e-8aa3-a64e3359883a
2021-05-04T21:36:37.238-0200	INFO	instance/beat.go:407	filebeat stopped.
2021-05-04T21:36:37.238-0200	ERROR	instance/beat.go:963	Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).
Exiting: data path already locked by another beat. Please make sure that multiple beats are not sharing the same data path (path.data).
/go/src/github.com/elastic/beats/filebeat/filebeat.test -systemTest -e -d * -once -c /go/src/github.com/elastic/beats/filebeat/build/system-tests/run/test_modules.Test.test_fileset_file_076_elasticsearch/filebeat.yml -E setup.ilm.enabled=false -modules=elasticsearch -M elasticsearch.*.enabled=false -M elasticsearch.gc.enabled=true -M elasticsearch.gc.var.input=file -M elasticsearch.gc.var.paths=[/go/src/github.com/elastic/beats/filebeat/module/elasticsearch/gc/test/test.log] -M *.*.input.close_eof=true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like filebeat didn't exit before the next test of a module started.

I'm going to approve and see if we get the same errors with CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I still would like to have the generated data files updated so follow on PRs don't have to worry about the unrelated changes to the data files. So if you're able to run all the modules together and update the files, can you push those changes to the branch??

@legoguy1000 legoguy1000 marked this pull request as ready for review May 5, 2021 20:21
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@mergify
Copy link
Contributor

mergify bot commented May 10, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b sort-generated-data-lists upstream/sort-generated-data-lists
git merge upstream/master
git push upstream sort-generated-data-lists

@leehinman leehinman force-pushed the sort-generated-data-lists branch 3 times, most recently from 23b9383 to c41305c Compare May 12, 2021 01:03
@mergify
Copy link
Contributor

mergify bot commented May 12, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b sort-generated-data-lists upstream/sort-generated-data-lists
git merge upstream/master
git push upstream sort-generated-data-lists

@legoguy1000
Copy link
Contributor Author

@leehinman I was able to run all the filebeat modules successfully. Ready for CI tests.

@legoguy1000
Copy link
Contributor Author

I ran it multiple times and no change in order of array fields.

@leehinman leehinman added the backport-v7.14.0 Automated backport with mergify label May 13, 2021
@leehinman leehinman merged commit 52f2265 into elastic:master May 13, 2021
mergify bot pushed a commit that referenced this pull request May 13, 2021
- sort arrays by using string representation of values
- update generated data

(cherry picked from commit 52f2265)

# Conflicts:
#	x-pack/filebeat/module/azure/signinlogs/test/signinlogs.log-expected.json
#	x-pack/filebeat/module/cyberark/corepas/test/generated.log-expected.json
#	x-pack/filebeat/module/f5/bigipafm/test/generated.log-expected.json
#	x-pack/filebeat/module/fortinet/fortimanager/test/generated.log-expected.json
#	x-pack/filebeat/module/imperva/securesphere/test/generated.log-expected.json
#	x-pack/filebeat/module/iptables/log/test/geo.log-expected.json
#	x-pack/filebeat/module/iptables/log/test/icmp.log-expected.json
#	x-pack/filebeat/module/iptables/log/test/iptables.log-expected.json
#	x-pack/filebeat/module/iptables/log/test/ipv6.log-expected.json
#	x-pack/filebeat/module/iptables/log/test/ubiquiti.log-expected.json
#	x-pack/filebeat/module/panw/panos/test/global_protect.log-expected.json
#	x-pack/filebeat/module/panw/panos/test/pan_inc_other.log-expected.json
#	x-pack/filebeat/module/panw/panos/test/pan_inc_threat.log-expected.json
#	x-pack/filebeat/module/panw/panos/test/pan_inc_traffic.log-expected.json
#	x-pack/filebeat/module/panw/panos/test/userid.log-expected.json
#	x-pack/filebeat/module/snort/log/test/generated.log-expected.json
#	x-pack/filebeat/module/sonicwall/firewall/test/generated.log-expected.json
#	x-pack/filebeat/module/sophos/utm/test/generated.log-expected.json
#	x-pack/filebeat/module/tomcat/log/test/generated.log-expected.json
@legoguy1000 legoguy1000 deleted the sort-generated-data-lists branch May 13, 2021 15:47
leehinman pushed a commit that referenced this pull request May 14, 2021
- sort arrays by using string representation of values
- update generated data

(cherry picked from commit 52f2265)
leehinman pushed a commit that referenced this pull request May 14, 2021
- sort arrays by using string representation of values
- update generated data

(cherry picked from commit 52f2265)

Co-authored-by: Alex Resnick <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants