Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple patterns in parser transforms #1477

Closed
binarylogic opened this issue Jan 3, 2020 · 3 comments
Closed

Allow multiple patterns in parser transforms #1477

binarylogic opened this issue Jan 3, 2020 · 3 comments
Labels
domain: parsing Anything related to parsing within Vector meta: good first issue Anything that is good for new contributors. meta: idea Anything in the idea phase. Needs further discussion and consensus before work can begin. needs: approval Needs review & approval before work can begin. needs: requirements Needs a a list of requirements before work can be begin type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@binarylogic
Copy link
Contributor

We should think about making it easier to parse a variety of log formats in parsing transforms (regex_parser, grok_parser, etc).

Use case

An interesting use case is parsing common logs, such as a Ruby on Rails logs:

Started GET "/" for 127.0.0.1 at 2012-03-10 14:28:14 +0100
Processing by HomeController#index as HTML
  Rendered text template within layouts/application (0.0ms)
  Rendered layouts/_assets.html.erb (2.0ms)
  Rendered layouts/_top.html.erb (2.6ms)
  Rendered layouts/_about.html.erb (0.3ms)
  Rendered layouts/_google_analytics.html.erb (0.4ms)
Completed 200 OK in 79ms (Views: 78.8ms | ActiveRecord: 0.0ms)

Where someone might want to define a set of regular expressions (or grok expressions) to parse all of these logs. It'd be neat to classify (namespace) these logs as well. For example:

[transforms.parse_rails_logs]
  type = "regex_parser"
  type_field = "type"
  regex.http_request = "^Started (?P<method>[^\s]*) for (?P<remote_addr>[^\s]*)
  regex.template_render = "..."
  regex.http_response = "..."

Would produce events like:

{"http_request": {"method": "GET", "remote_addr": "127.0.0.1"}}
{"template_render": {"path": "layouts/application", "duration_ms": 2.6}}
// ...
{"http_response": {"status": 200, "duration_ms": 79}}

The nested format is opinionated and something I've found helpful to avoid name conflicts in downstream storages. We could just as easily namespace the fields with a prefix (http_request.method).

This would pair nicely with #1437.

@binarylogic binarylogic added type: enhancement A value-adding code change that enhances its existing functionality. transform: regex_parser meta: idea Anything in the idea phase. Needs further discussion and consensus before work can begin. needs: approval Needs review & approval before work can begin. needs: requirements Needs a a list of requirements before work can be begin labels Jan 3, 2020
@binarylogic binarylogic added this to the Improve Parsing milestone Jan 3, 2020
@binarylogic binarylogic removed this from the Improve Parsing milestone Aug 2, 2020
@binarylogic
Copy link
Contributor Author

Partially completed in #2493.

@itkovian
Copy link
Contributor

itkovian commented Oct 9, 2020

I'd like to have nested data in the results, see e.g., #4465.

@binarylogic
Copy link
Contributor Author

Closing in favor of #5726

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: parsing Anything related to parsing within Vector meta: good first issue Anything that is good for new contributors. meta: idea Anything in the idea phase. Needs further discussion and consensus before work can begin. needs: approval Needs review & approval before work can begin. needs: requirements Needs a a list of requirements before work can be begin type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

2 participants