Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] key_value_parser: Allow values that contain the delimiter string #29629

Merged
merged 3 commits into from
Dec 5, 2023

Conversation

jmalloc
Copy link
Contributor

@jmalloc jmalloc commented Dec 4, 2023

Description:

This PR fixes an issue in the key_value_parser operator wherein values containing the delimiter (=, by default), would cause a parse failure.

Given the log body:

msg="Message successfully sent at 2023-12-04 06:47:31.204222276 +0000 UTC m=+5115.932279346"

The key_value_parser would fail with the following error:

expected '...<snip>...' to split by '=' into two items, got 3

After this PR, the key_value_parser instead emits the following key/value pair:

("msg", "Message successfully sent at 2023-12-04 06:47:31.204222276 +0000 UTC m=+5115.932279346")

As a result of this change, it is now also possible to parse unquoted values that contain the delimiter. For example foo=bar=spam is parsed as ("foo", "bar=spam"). Because there was an existing test case for this failure mode I had considered explicitly disallowing this behaviour for unquoted values. However, I couldn't think of a reason to prefer an error so I decided to float the PR as-is first. FWIW, this behaviour has some prior art, for example parsing of environment variables.

Link to tracking Issue:

Testing:

I've added an additional test case named quoted-value-contains-delimiter that tests the problematic message I encountered in the wild.

I've also replaced the existing invalid-pair test as a result of the changed behaviour mentioned above.

Documentation:

Copy link

linux-foundation-easycla bot commented Dec 4, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@jmalloc jmalloc marked this pull request as ready for review December 4, 2023 08:28
@jmalloc jmalloc requested a review from a team December 4, 2023 08:28
Copy link
Contributor

@VihasMakwana VihasMakwana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this fix.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jmalloc. This makes sense to me.

"quoted-value-contains-delimiter",
func(kv *Config) {},
&entry.Entry{
Body: `msg="Message successfully sent at 2023-12-04 06:47:31.204222276 +0000 UTC m=+5115.932279346"`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also have a test which involves 3+ key value pairs? This would validate we have the correct interaction with pair_delimiter, and help ensure we do not in the future limit the number of pairs to 2.

@jmalloc
Copy link
Contributor Author

jmalloc commented Dec 4, 2023

Per @djaglowski's request I've added a multiple-values-contain-delimiter test case in 7b4ef32

@djaglowski djaglowski merged commit 96e3c0c into open-telemetry:main Dec 5, 2023
83 checks passed
@github-actions github-actions bot added this to the next release milestone Dec 5, 2023
@jmalloc
Copy link
Contributor Author

jmalloc commented Dec 5, 2023

Thanks, all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants