Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat][google workspace] - Added canonical sorting over fingerprint keys array to maintain key order #40055

Merged
merged 6 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Fix request trace filename handling in http_endpoint input. {pull}39410[39410]
- Fix filestream not correctly tracking the offset of a file when using the `include_message` parser. {pull}39873[39873] {issue}39653[39653]
- Upgrade github.com/hashicorp/go-retryablehttp to mitigate CVE-2024-6104 {pull}40036[40036]
- Fix for Google Workspace duplicate events issue by adding canonical sorting over fingerprint keys array to maintain key order. {pull}40055[40055] {issue}39859[39859]

*Heartbeat*

Expand Down
5 changes: 3 additions & 2 deletions x-pack/filebeat/module/google_workspace/config/common.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,16 @@ var googleWorkspace = (function () {
"json.id.applicationName",
"json.id.customerId",
];
var dynKeyArr = [];
Copy link
Member

@andrewkroh andrewkroh Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section looks like it is working around limitations in the beat fingerprint processor. And it's not entirely overcoming those limitations (hashing arrays and objects). I think the would be reasonable to enhance the fingerprint processor to accommodate this usage (the ES fingerprint processor does this). Or move this _id processing into ingest node.

But I would consider fingerprinting on the JSON string that is contained in message if the full contents are suitable for a fingerprint (i.e. each time an event is requested the api returns the same JSON).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried atm to change the fingerprint processor behaviour since changing it there would impact all modules utilising the processor and could possibly cause an initial wave of duplicates all across the board since the sorting will impact the fingerprint. I'll bring it up in the team sync once to discuss this and then make the change. For now I'm merging this change(otherwise it will go past FF) and we'll see and observe if this impacts duplicate creation in any way.

Object.keys(evt.Get("json.events")).forEach(function(evtsKey) {
var key = "json.events."+evtsKey;
var value = evt.Get(key);
if (!Array.isArray(value) && !(typeof value === "object")) {
keys.push(key);
dynKeyArr.push(key);
}
});
new processor.Fingerprint({
fields: keys,
fields: keys.concat(dynKeyArr.sort()),
target_field: "@metadata._id",
ignore_missing: true,
fail_on_error: false,
Expand Down
Loading