Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Function which converts slice into map #35256

Open
djaglowski opened this issue Sep 17, 2024 · 10 comments
Open

[pkg/ottl] Function which converts slice into map #35256

djaglowski opened this issue Sep 17, 2024 · 10 comments
Labels
discussion needed Community discussion needed enhancement New feature or request pkg/ottl

Comments

@djaglowski
Copy link
Member

djaglowski commented Sep 17, 2024

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

I often hear from users that it's difficult to work with slices in OTTL and/or backends. Issues such as #29289 may alleviate this to some extent, but I believe in many cases users would prefer to work around slices by converting them to maps. This obviously comes with some assumptions, but I think there may be a reasonable way to provide this functionality.

Describe the solution you'd like

A new function called Associate, which requires two parameters:

  1. A path to an array.
  2. A "sub-path", defined by a slice of strings, which traverses to the desired key in each element.

e.g. Given a log record with the following attributes:

attributes:
  hello: world
  things:
    - name: foo
      value: 2
    - name: bar
      value: 5

A user could write something like Associate(attributes["things"], ["name"]) and expect the following output:

attributes:
  hello: world
  things:
    foo:
      name: foo
      value: 2
    bar:
      name: bar
      value: 5

A slightly more complicated example highlights the notion of a sub-path. I'm not sure if OTTL has a mechanism for this, but if the array contains more complicated values, it may be necessary to drill further into the object to find its key.

attributes:
  hello: world
  things:
    - value: 2
      details:
        name: foo
    - value: 5
      details:
        name: bar

In the above case, Associate(attributes["things"], ["details", "name"]) would produce:

attributes:
  hello: world
  things:
    foo:
      value: 2
      details:
        name: foo
    bar:
      value: 5
      details:
        name: bar

It may make sense for users to specify, using an optional boolean, whether they wish to delete the key field, since it becomes redundant. (Or maybe this is just the default/only behavior.) Using the previous example, we'd expect:

attributes:
  hello: world
  things:
    foo:
      value: 2
      details: {}
    bar:
      value: 5
      details: {}

Finally, this comes with some caveats about uniqueness of keys. I think since arrays are ordered it is simple enough to say that in the case of duplicates, the first or last one wins.

Describe alternatives you've considered

No response

Additional context

No response

@djaglowski djaglowski added enhancement New feature or request needs triage New item requiring triage labels Sep 17, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member Author

Another option would be useful as well, to allow the value to simultaneously be specified via a sub-path.

attributes:
  hello: world
  things:
    - name: foo
      details:
        value: 2
    - name: bar
      details:
        value: 5

Associate(attributes["things"], ["name"], ["details", "value"] ) would produce:

attributes:
  hello: world
  things:
    foo: 2
    bar: 5

@bacherfl
Copy link
Contributor

I would gladly work on a PR if this function should be added

@gantrior
Copy link

I would be glad if this gets implemented

@bacherfl
Copy link
Contributor

I have created a draft PR now to outline how this could be implemented. Will keep it in draft mode for now, until the code owners have agreed that this should be added

@TylerHelmuth
Copy link
Member

I'm not totally clear on what problems we're solving, but I acknowledge that this transformation could be possible.

I'd like to talk some more about edge cases:

  • What happens with nested slices?
  • What happens if the slice is not a []map[string]any?
  • What is the associated field has a value that isn't a string?

I also think the name needs to be more direct. Associate is not self-descriptive enough in my opinion. Something like SliceToMap is clearer.

@TylerHelmuth TylerHelmuth added discussion needed Community discussion needed and removed needs triage New item requiring triage labels Sep 25, 2024
@djaglowski
Copy link
Member Author

djaglowski commented Sep 25, 2024

I'm not totally clear on what problems we're solving

Here's an example of where this would be useful. Windows Event Logs contain content like the following:

    <EventData>
        <Data Name='SubjectUserSid'>S-1-0-0</Data>
        <Data Name='SubjectUserName'>-</Data>
        <Data Name='SubjectDomainName'>-</Data>
        <Data Name='SubjectLogonId'>0x0</Data>
        <Data Name='TargetUserSid'>S-1-0-0</Data>
        <Data Name='TargetUserName'>mr_rogers</Data>
        <Data Name='TargetDomainName'>-</Data>
        <Data Name='Status'>0xc000006d</Data>
        <Data Name='FailureReason'>%%2313</Data>
        <Data Name='SubStatus'>0xc0000064</Data>
        <Data Name='LogonType'>3</Data>
        <Data Name='LogonProcessName'>NtLmSsp </Data>
        <Data Name='AuthenticationPackageName'>NTLM</Data>
        <Data Name='WorkstationName'>D-508</Data>
        <Data Name='TransmittedServices'>-</Data>
        <Data Name='LmPackageName'>-</Data>
        <Data Name='KeyLength'>0</Data>
        <Data Name='ProcessId'>0x0</Data>
        <Data Name='ProcessName'>-</Data>
        <Data Name='IpAddress'>194.169.175.45</Data>
        <Data Name='IpPort'>0</Data>
    </EventData>

We could debate which way this should be parsed but one reasonable representation would be a slice of "Data" maps, where each contains a key "Name" and a value.

Given such a scenario, it is very difficult to work with the slice in OTTL. I cannot for example refer directly to the "ProcessId" field unless I know exactly where to expect it in the slice, and even then this would be fragile.

Converting this to a map allows me to refer to it in a standard way, e.g. attributes["EventData"]["ProcessID"]

In practice, folks at my company have run into many similar situations. As you noted, not all cases can be solved by this because slices may contain varying data. Still, I would think that a function like this could be very useful without being fragile.


What happens if the slice is not a []map[string]any?

I would think this is a requirement. The function could validate this before doing anything and then nop if not satisfied.

What happens with nested slices?

I think this is covered if it must be []map[string]any

What is the associated field has a value that isn't a string?

It can probably be a requirement. In theory we could tostring some other types but I doubt that would be very useful.

I also think the name needs to be more direct. Associate is not self-descriptive enough in my opinion. Something like SliceToMap is clearer.

Seems reasonable to me.

@bacherfl
Copy link
Contributor

@djaglowski what would you say makes more sense in case one of the slice items is not a []map[string]any, or the path to the key or value can not be resolved? Either

a) Return an error and exit the function
b) Ignore the current item and proceed with attempting to convert the remaining slice items

In my draft I went with option a for now, but looking at it again, there also might be arguments for option b.

@djaglowski
Copy link
Member Author

I can see users wanting either option, but personally I would typically prefer a best effort transformation.

There may be subsequent statements which depend on a struct, or on a particular key/value in the struct which could be available in spite of the error.

For example, say [ { "k": "foo", "v": "one" }, "invalid", { "k": "bar", "v": "two" } ] is the input. If the function returns { "foo": "one", "bar": "two" } (and say I assign it to the body), then set(attribute["foo"], body["foo"]) still works.

I think there would be a wide variety of use cases here, with some being broken by any error, and others being fairly resilient.
I know the transform processor has a setting called error_mode which gives the user control over this, and if I'm understanding correctly, the the ignore setting is basically intended for best effort, which implies to me that the functions should make a best effort at all times but also return the error.

@bacherfl
Copy link
Contributor

I can see users wanting either option, but personally I would typically prefer a best effort transformation.

There may be subsequent statements which depend on a struct, or on a particular key/value in the struct which could be available in spite of the error.

For example, say [ { "k": "foo", "v": "one" }, "invalid", { "k": "bar", "v": "two" } ] is the input. If the function returns { "foo": "one", "bar": "two" } (and say I assign it to the body), then set(attribute["foo"], body["foo"]) still works.

I think there would be a wide variety of use cases here, with some being broken by any error, and others being fairly resilient. I know the transform processor has a setting called error_mode which gives the user control over this, and if I'm understanding correctly, the the ignore setting is basically intended for best effort, which implies to me that the functions should make a best effort at all times but also return the error.

Thanks @djaglowski, that makes sense - I will adapt the implementation accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Community discussion needed enhancement New feature or request pkg/ottl
Projects
None yet
Development

No branches or pull requests

4 participants