Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional keys following name.sub where sub is 2 characters being missed #50

Closed
errantpixel opened this issue Dec 6, 2017 · 3 comments · Fixed by #73
Closed

additional keys following name.sub where sub is 2 characters being missed #50

errantpixel opened this issue Dec 6, 2017 · 3 comments · Fixed by #73

Comments

@errantpixel
Copy link

In CEF messages from a certain vendor, where the additional key value pairs follow a convention like ad.duration or ad.custom .. if that key is ad.nn it gets parsed as if its part of the last valid key.

If you receive

Dec 6 20:25:25 loghost CEF:0|Vendor|Device|Version|13|my message|5|dvchost=loghost cat=traffic deviceSeverity=notice ad.nn=TEST src=192.168.0.1 destinationPort=53 ...
You end up with

deviceSeverity => "notice ad.nn=TEST"

in the parsed output. This appears to occur with any x.y key value where the "y" is only two characters and I can't seem to figure out where in the codec thats breaking.

@fpompermaier
Copy link

Same problem for me..

@jsvd
Copy link
Member

jsvd commented May 28, 2019

confirmed:

irb(main):032:0> puts "ad.vd=Test".scan(LogStash::Codecs::CEF::EXTENSION_KEY_VALUE_SCANNER).map {|k,v| "key: \"#{k}\"\nvalue: \"#{v}\"" }.join("\n\n")
key: "vd"
value: "Test"
=> nil
irb(main):033:0> puts "ad.aavd=Test".scan(LogStash::Codecs::CEF::EXTENSION_KEY_VALUE_SCANNER).map {|k,v| "key: \"#{k}\"\nvalue: \"#{v}\"" }.join("\n\n")
key: "ad.aavd"
value: "Test"

This seems to be because the key pattern is defined as:

  # That sequence must begin with one or more `\w` (word: alphanumeric + underscore), which _optionally_ may be followed
  # by "subkey" sequence consisting of a literal dot (`.`) followed by a non-whitespace character, then one or more word
  # characters, and then one or more characters that do not convey semantic meaning within CEF (e.g., literal-pipe (`|`),
  # whitespace (`\s`), literal-dot (`.`), literal-equals (`=`), or literal-backslash ('\')).
  EXTENSION_KEY_PATTERN = /(?:\w+(?:\.[^\s]\w+[^\|\s\.\=\\]+)?(?==))/

So it means that if a dot is there, it requires 3 characters:

  1. a non-white space character
  2. one or more word characters
  3. one more character not from this list: [^\|\s\.\=\\]

It seems like we could make the 2nd character optional or remove it altogether. some testing is necessary to make sure we don't break the parsing elsewhere.

@colinsurprenant
Copy link
Contributor

This should be fixed with #73 and v6.0.1 has been published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants