Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(regex_parser transform): Correctly assign capture group fields
Previously, was incorrectly mapping the capture indexes of the matched pattern across the capture groups of all patterns so that, with something like: ```toml [sources.in] type = "stdin" [transforms.regex] type = "regex_parser" inputs = ["in"] patterns = [ '^blah \((?P<socket_code>[0-9]+): (?:[^\)]+)\) while (?P<timeout_while>.)', "^notblah (?P<close_while>.+)$", ] [sinks.out] inputs = ["regex"] type = "console" encoding.codec = "json" ``` And a line of: ``` 'notblah something `` Would end up setting both the `socket_code` and `close_while` fields: ```json { "close_while": "something", "host": "jesse-thinkpad", "socket_code": "something", "source_type": "stdin", "timestamp": "2020-07-22T19:30:12.647060371Z" } ``` This change simply updates `RegexParser.capture_names` to also be a `Vec` of the capture information for each pattern similar to `capture_logs` and uses the same match index later to access it. A couple of questions that came up while I was looking at this: It looks like, if multiple patterns match, it simply chooses the first one. Is this what we want? It was indirectly discussed in #2493 but a preference wasn't explicitly stated and it doesn't appear to be documented (in `master`) for the new `patterns` field. Once we decide what the behavior should be, I can document it and/or change the implementation if needed. I might have expected it to apply each matching pattern. I expected to still see the deprecated `regex` parameter in the, unrelased, documentation; just marked as deprecated, but it appears to have been dropped wholesale in https://github.com/timberio/vector/pull/2493/files#diff-4d642800436bfa506ff51f7b75556d9dL41 . I just wanted to clarify if this is the expected the process for deprecating parameters. Fixes: #3096 Signed-off-by: Jesse Szwedko <[email protected]>
- Loading branch information