-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce source.address
and destination.address
.
#247
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example where this would play out? If an event contains an IP address string as the host, then why don't we just put that in the destination.domain
field? Seeing an IP address string in a domain field is somewhat common and I'm not sure is a problem. Also, I'm concerned about populating the destination.ip
field with information that is not coming directly from a packet or session event.
I think I prefer the simpler "just put whatever you get" into the destination.domain
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this change as it's provide a good place for a value where it's not clear what it is yet. So far we used domain but to me it always didn't feel obvious to put it there, especially the sockets.
@MikePaquette This would create a situation where source.domain and destination.domain in almost all flow events would contain IP addresses, however. I'd much rather introduce this field that's known to be kind of a mess, whose name doesn't mean anything specific (address is more generic than domain), but is always guaranteed to have a value of whatever that endpoint was. |
I think adding a generic address fields introduces ambiguity and will make it more difficult to query information. The field provides more flexibility when generating data, but the cost is that you need to query more fields to find information (ip, domain, and now address) and altogether lose the ability to do CIDR searches if the value is an IP. It's a trade-off between doing the work at ingest time to classify the data or pushing that work to query-time. (I'm considering the case where you have either an IP address or hostname, and ignoring unix sockets. For unix sockets perhaps having separate field distinctly for them would make sense.) |
I think this is also about providing a field when we can't do it at ingest time. It could even be a post-processing job to split up the fields but it would still know what default field it should be based on. For me |
@andrewkroh The way I envision this (and perhaps my definition isn't clear enough) this field works exactly the reverse way: If you have an IP:
If you have a domain:
If you have a socket:
Depending on the event stream being observed, the overwhelming majority of events will fall cleanly within expectations, using the expected field (e.g. access logs will mostly contain IP addresses). But if you then rely solely on the field you expect*, you are at risk of filtering out the weird events that fall outside of that expectation. Whether it's local access via a unix socket or it's a resolved name in the place of the IP address (httpd's HostnameLookups set to On). The goal of this field is to allow looking for the long tail of weirdness. To have a place where you know if you look there, you have that endpoint's address, whatever form it took. Also, the unix socket is so rare that I'd much rather have only * Unless you ensure to always consider the events with "missing value" as well. |
@webmat your explanation is helpful, and I am OK to define such fields as extended fields in ECS. I would prefer a more descriptive/explicit field name that has less potential for confusion for users. For example, Related optimization question: would there be any advantage for subsequent queries/aggs/viz's if in the "domain" or "socket" cases, ECS converters populated the |
+1 on using |
@MikePaquette For missing values, leaving the fields absent is the simplest option, and the one I think we should advocate. For the name of the field, I do think |
I meant to paste the Beats PR in my comment above: elastic/beats#8941 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the changelog.
CHANGELOG.md
Outdated
@@ -38,6 +38,7 @@ All notable changes to this project will be documented in this file based on the | |||
* Reintroduce a streamlined `user_agent` field set. #240 | |||
* Add `geo.name` for ad hoc location names. #248 | |||
* Add `event.timezone` to allow for proper interpretation of incomplete timestamps. #258 | |||
* Add fields `source.address` and `destination.address`. #247 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is also added to client and server it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation helps clarify purpose nicely.
As discussed earlier today, let's see if we can introduce this in Beta 2.
This would be very useful, as the field that's always guaranteed to have the endpoint address, no matter if it's an IP, a hostname or a unix socket.