-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add json_array_parser parser and assign_keys transformer #30644
Add json_array_parser parser and assign_keys transformer #30644
Conversation
|
pkg/stanza/operator/parser/headerless_jarray/headerless_jarray.go
Outdated
Show resolved
Hide resolved
pkg/stanza/operator/parser/headerless_jarray/headerless_jarray.go
Outdated
Show resolved
Hide resolved
pkg/stanza/operator/parser/headerless_jarray/headerless_jarray.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @RoeiDimi, the design feels right. I left a lot of comments but mostly just nits. A few minor things I think we need to take care of though.
a8388f3
to
7f93ec8
Compare
@RoeiDimi, I was able to resolve the issue by adding the new dependency directly to the root module. Thanks for the new features and for iterating on the design with me. |
Thanks @djaglowski, it's been a pleasure working together! One last question though - assign_keys can't really parse the values straight into attributes (the way csv_parser can). i mean, I now understand that all this may not solve my initial issue completely.
we have a generic exporter code that reads all the key-values from attributes and sends them. Can you think of a way I could use the json_array_parser -> assign_keys design to achieve that? |
@RoeiDimi, looks like we lost track of the goal somewhat but luckily you should be able to flatten after assigning keys: operators:
- type: json_array_parser
parse_from: body
parse_to: attributes.tmp
- type: assign_keys
keys: ["a", "b", "c"]
field: attributes.tmp
- type: flatten
field: attributes.tmp Edit: I should point out this is not entirely by luck, but rather is a natural benefit of the highly decomposed approach we've always pursued with these operators. With a robust set of simple operations, we can typically compose them to do what's needed. |
@djaglowski yeah that's pretty cool, thank you so much! :) I apologize but I have another question : so I tried running the collector with all the following variations but none worked -
|
@djaglowski, @RoeiDimi Shouldn't we have the functionality in OTTL language? I was able to achive the same functionality I believe using
./tmp/logs/5.json
output:
|
@sumo-drosiek @djaglowski , this suggestion may be interesting as after i ran large-scale stress tests it seems that using the 3 operators together (json_array_parser, assign_keys, flatten) result in a massive performance hit. Seems like chaining operators is not very efficient in general as the initial design with a single operator (headerless_jarray_parser) that does the same logic as in json_array_parser+assign_keys together was significantly better performance-wise. anyway @sumo-drosiek I'd like to test your suggestion, i tried adding this to the builder config -
but im getting and error saying the revision is wrong - is it not Thanks guys for helping me through this entirely new world of OTEL collector hehe.. |
@RoeiDimi It should be |
Hi @djaglowski For my benchmarks I'm using Azure kubernetes, single node, single pod, 5 cpus. I tested the following:
It also makes sense in a way as I totally get the refactor we went for from a modularity point of view but is there any chance you'd agree to now also adding |
@RoeiDimi, thanks for providing detailed performance numbers. I think given the notable performance benefit, it's reasonable to revisit the combined operator.
If we were to apply the same terminology update as we arrived at, we are talking about a
The benefit of this approach is that the operator just has the ability to apply an optional behavior, but one that is closely related to the first step of parsing the array. WDYT? |
@djaglowski yes, that's a good, clean idea |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Adding a feature following #30644. This feature allow json_array_parser parser to accept a comma-delimited header and for every json array it parses, output a map which contains the header fileds as keys and the matching values are the ones parsed from the input json array. This feature as added mainly for performance reasons as from a functional POV, this is mostly similar to chaining the 2 operators: `json_array_parser -> assign_keys ` **Link to tracking Issue:** <Issue number if applicable> #30321 **Testing:** <Describe what testing was performed and which tests were added.> - unittests - End to end tests Used generated traffic on a running otel collector thats using the parser and verified the data is as expected in the end table and performance looks good **Documentation:** <Describe the documentation added.> - [json_array_parser.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/58cc91ca30eabbd35c074d79db8630fc474164d9/pkg/stanza/docs/operators/json_array_parser.md)
…try#30644) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> We are using otel-collector as an infrastructure and receive many types of data from a client. The client's sent data is always a form of json and in one use case the json is a simple headerless jarray and so we need a way to parse it and match headers to each field (something similar to what csv_parser does but also supports types supported in a json format and nested objects) **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30321 **Testing:** <Describe what testing was performed and which tests were added.> * unittests All the tests found in csv_parser were copied and adjusted adding test scenarios for different types (numbers, booleans, null) as well as a test for parsing a nested object as a part of the jarray * End to end tests Used generated traffic on a running otel collector thats using the parser and verified the data is as expected in the end table **Documentation:** <Describe the documentation added.> * [json_array_parser.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/beacea489ff4ae61c0bac4f477c04748944c9054/pkg/stanza/docs/operators/json_array_parser.md) * [assign_keys.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/beacea489ff4ae61c0bac4f477c04748944c9054/pkg/stanza/docs/operators/assign_keys.md) --------- Co-authored-by: Daniel Jaglowski <[email protected]>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Adding a feature following open-telemetry#30644. This feature allow json_array_parser parser to accept a comma-delimited header and for every json array it parses, output a map which contains the header fileds as keys and the matching values are the ones parsed from the input json array. This feature as added mainly for performance reasons as from a functional POV, this is mostly similar to chaining the 2 operators: `json_array_parser -> assign_keys ` **Link to tracking Issue:** <Issue number if applicable> open-telemetry#30321 **Testing:** <Describe what testing was performed and which tests were added.> - unittests - End to end tests Used generated traffic on a running otel collector thats using the parser and verified the data is as expected in the end table and performance looks good **Documentation:** <Describe the documentation added.> - [json_array_parser.md](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/58cc91ca30eabbd35c074d79db8630fc474164d9/pkg/stanza/docs/operators/json_array_parser.md)
Description:
We are using otel-collector as an infrastructure and receive many types of data from a client. The client's sent data is always a form of json and in one use case the json is a simple headerless jarray and so we need a way to parse it and match headers to each field (something similar to what csv_parser does but also supports types supported in a json format and nested objects)
Link to tracking Issue:
#30321
Testing:
All the tests found in csv_parser were copied and adjusted adding test scenarios for different types (numbers, booleans, null) as well as a test for parsing a nested object as a part of the jarray
Used generated traffic on a running otel collector thats using the parser and verified the data is as expected in the end table
Documentation: