Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Filebeat iis.access to ECS #9084

Merged
merged 13 commits into from
Nov 22, 2018
Merged

Conversation

webmat
Copy link
Contributor

@webmat webmat commented Nov 14, 2018

Caveats

  • user agent is encoded in the log. Decoding to get better UA parsing results.
    • It's not quoted, so spaces are replaced with + signs. Parens and slashes are not replaced, though.
    • Using urldecode on it makes it more palatable to user_agent parser.

Renames

  • iis.access.server_ip => destination.ip
  • iis.access.remote_ip => source.ip
  • iis.access.method => http.request.method
  • iis.access.url => url.path
  • iis.access.query_string => url.query
  • iis.access.port => destination.port
  • iis.access.user_name => user.name
  • iis.access.referrer => http.request.referrer
  • iis.access.response_code => http.response.status_code
  • iis.access.hostname => destination.domain
  • iis.access.user_agent.original => user_agent.original
  • iis.access.geoip => source.geo

TODO

  • Convert status code, port, byte counts, request times to int
  • Update ECS-migration.yml file
  • Changelog
  • Create field aliases
  • Output user_agent to ECS field names
    • Next step: figure out what to do with fully broken down OS version numbers, when ECS supports one version string :-)
  • List decoding of user agent in breaking changes

@ruflin ruflin mentioned this pull request Nov 14, 2018
@webmat webmat self-assigned this Nov 14, 2018
@webmat webmat requested a review from ruflin November 14, 2018 21:33
@webmat webmat added in progress Pull request is currently in progress. module Filebeat Filebeat ecs labels Nov 14, 2018
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add changelog and migration file.

@webmat webmat removed the in progress Pull request is currently in progress. label Nov 20, 2018
@webmat
Copy link
Contributor Author

webmat commented Nov 20, 2018

@ruflin Ready for review.

@ruflin ruflin added the review label Nov 20, 2018
@ruflin ruflin changed the title [WIP] Convert Filebeat iis.access to ECS Convert Filebeat iis.access to ECS Nov 20, 2018
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For user_agent problem, see other issue.

"%{TIMESTAMP_ISO8601:iis.access.time} %{IPORHOST:iis.access.server_ip} %{WORD:iis.access.method} %{URIPATH:iis.access.url} %{NOTSPACE:iis.access.query_string} %{NUMBER:iis.access.port} %{NOTSPACE:iis.access.user_name} %{IPORHOST:iis.access.remote_ip} %{NOTSPACE:iis.access.agent} %{NOTSPACE:iis.access.referrer} %{NUMBER:iis.access.response_code} %{NUMBER:iis.access.sub_status} %{NUMBER:iis.access.win32_status} %{NUMBER:iis.access.request_time_ms}",
"%{TIMESTAMP_ISO8601:iis.access.time} %{NOTSPACE:iis.access.site_name} %{WORD:iis.access.method} %{URIPATH:iis.access.url} %{NOTSPACE:iis.access.query_string} %{NUMBER:iis.access.port} %{NOTSPACE:iis.access.user_name} %{IPORHOST:iis.access.remote_ip} %{NOTSPACE:iis.access.agent} %{NOTSPACE:iis.access.cookie} %{NOTSPACE:iis.access.referrer} %{NOTSPACE:iis.access.hostname} %{NUMBER:iis.access.response_code} %{NUMBER:iis.access.sub_status} %{NUMBER:iis.access.win32_status} %{NUMBER:iis.access.body_sent.bytes} %{NUMBER:iis.access.body_received.bytes} %{NUMBER:iis.access.request_time_ms}",
"%{TIMESTAMP_ISO8601:iis.access.time} %{NOTSPACE:iis.access.site_name} %{NOTSPACE:iis.access.server_name} %{IPORHOST:iis.access.server_ip} %{WORD:iis.access.method} %{URIPATH:iis.access.url} %{NOTSPACE:iis.access.query_string} %{NUMBER:iis.access.port} %{NOTSPACE:iis.access.user_name} %{IPORHOST:iis.access.remote_ip} HTTP/%{NUMBER:iis.access.http_version} %{NOTSPACE:iis.access.agent} %{NOTSPACE:iis.access.cookie} %{NOTSPACE:iis.access.referrer} %{NOTSPACE:iis.access.hostname} %{NUMBER:iis.access.response_code} %{NUMBER:iis.access.sub_status} %{NUMBER:iis.access.win32_status} %{NUMBER:iis.access.body_sent.bytes} %{NUMBER:iis.access.body_received.bytes} %{NUMBER:iis.access.request_time_ms}"
"%{TIMESTAMP_ISO8601:iis.access.time} %{IPORHOST:destination.ip} %{WORD:http.request.method} %{URIPATH:url.path} %{NOTSPACE:url.query} %{NUMBER:destination.port:int} %{NOTSPACE:user.name} %{IPORHOST:source.ip} %{NOTSPACE:iis.access.agent} %{NOTSPACE:http.request.referrer} %{NUMBER:http.response.status_code:int} %{NUMBER:iis.access.sub_status:int} %{NUMBER:iis.access.win32_status:int} %{NUMBER:iis.access.request_time_ms:int}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPORHOST here seems also one of these fields where we should do a follow up PR to make it only IP as otherwise it could break on ingest time.

@webmat
Copy link
Contributor Author

webmat commented Nov 21, 2018

@ruflin I've worked a bit on the user_agent parsing. You were right about the mapping issue. So this is resolved. My work has revealed a few things:

  • I noticed that in IIS logs, the UA is not quoted, and they therefore replace spaces with the +. I'm adding a urldecode step to get more precise UA parsing. (introduced in 6.1)
  • I added a different UA in the test log, which does get it's OS major and minor version parsed successfully. This revealed a problem, outlined at next point.
  • UA parser gives us fully broken down version fields (major, minor, patch), but no whole "version" field, whereas in ECS we only define "version".
    • This is showing up as a problem for user_agent.os.version. For now I haven't fixed this issue yet, I only created fields temporarily, so the tests don't break for this.

I think there's many ways we can go about this.

  • ECS should likely add support for version breakdown fields
  • Tackling the user_agent in this PR is informative. But we may want to close the other PRs without the UA fix, and do a subsequent PR only about getting UA right in all access logs.

Mathieu Martin added 12 commits November 22, 2018 14:00
- iis.access.server_ip => destination.ip
- iis.access.remote_ip => source.ip
- iis.access.method => http.request.method
- iis.access.url => url.path
- iis.access.query_string => url.query
- iis.access.port => destination.port
- iis.access.user_name => user.name
- iis.access.referrer => http.request.referrer
- iis.access.response_code => http.response.status_code
- iis.access.hostname => destination.domain
- iis.access.user_agent.original => user_agent.original
- iis.access.geoip => source.geo
@webmat
Copy link
Contributor Author

webmat commented Nov 22, 2018

This is ready for a final review.

Please make sure to check out the caveat above. I've made sure to list this "breaking" change improvement separately from the ECS transition.

@ruflin
Copy link
Member

ruflin commented Nov 22, 2018

For the version: I prefer the full version string. We had it broken down in an early version of ECS but I didn't see the benefits. Elasticsearch offers prefix queries which means queries for major.minor as an example are possible. The only thing not possible is for example > major but I would rather see ES have a type version which would support all these type of queries.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ready for merge.

For the user_agent decoding: Do we need this only for IIS or also in other modules?

description: >
The major version of the operating system.
type: alias
path: user_agent.os.major
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably use script processor to concatenate the version fields. Initially hoped the the join processor would do this but seems it has a different purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but all of this refining needs to go in a shared pipeline. We're not going to duplicate this work everywhere, on the eve of starting to share pipelines ;-)

@webmat
Copy link
Contributor Author

webmat commented Nov 22, 2018

For the user_agent decoding: Do we need this only for IIS or also in other modules?

No, in all of the open source web servers modules I've been working on, the user agent was always a quoted string. None of them needed to encode all the spaces by replacing them with a +. IIS is the only one where I've seen that.

@webmat
Copy link
Contributor Author

webmat commented Nov 22, 2018

Sorry, had missed this comment #9084 (comment) before reading your review.

Reconstructing the version string is out of scope for this PR, IMO. The user_agent IN processor doesn't give it to us, at this time. I'm down with requesting that change to the IN team, or reconstructing it, but I don't think it should block this mapping to ECS.

I've added this to #9208 so we don't lose track of this.

@webmat webmat merged commit 04c951d into elastic:master Nov 22, 2018
@webmat webmat deleted the ecs-iis-access branch November 22, 2018 21:44
webmat added a commit that referenced this pull request Jan 11, 2019
…ccess logs (#9955)

- Introduce IPv6 zone workaround iis.access log as well, resolving #9836.
- Update the IPv6 zone fix (#9869) for iis.error to use the ECS `.address` field instead of a transient field.
- Convert many fields under `iis.error.*` to ECS. Previous field names are field aliases towards the new corresponding ECS field:
  - iis.error.remote_ip => source.address
  - iis.error.remote_port => source.port
  - iis.error.server_ip => destination.address
  - iis.error.server_port => destination.port
  - iis.error.http_version => http.version
  - iis.error.method => http.request.method
  - iis.error.url => url.original
  - iis.error.response_code => http.response.status_code
  - iis.error.geoip.* => source.geo.*
  - read_timestamp => event.created (not aliased, still used elsewhere)
- Update field aliases introduced in #9084 to point to `.address` instead of `.ip`, since this value can be ambiguous. The IP field is populated with the cleaned up IP without the zone. This is also true for the `.ip` fields under populated by the error logs.
  - iis.access.remote_ip => source.address
  - iis.access.server_ip => destination.address
- Coerce to long: source.port, destination.port and http.response.status_code in the iis.error fileset
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants