Introduce `source.address` and `destination.address`. #247

webmat · 2018-12-06T21:09:19Z

As discussed earlier today, let's see if we can introduce this in Beta 2.

This would be very useful, as the field that's always guaranteed to have the endpoint address, no matter if it's an IP, a hostname or a unix socket.

MikePaquette

Can you give an example where this would play out? If an event contains an IP address string as the host, then why don't we just put that in the destination.domain field? Seeing an IP address string in a domain field is somewhat common and I'm not sure is a problem. Also, I'm concerned about populating the destination.ip field with information that is not coming directly from a packet or session event.

I think I prefer the simpler "just put whatever you get" into the destination.domain field.

ruflin

I'm in favor of this change as it's provide a good place for a value where it's not clear what it is yet. So far we used domain but to me it always didn't feel obvious to put it there, especially the sockets.

webmat · 2018-12-07T15:11:39Z

@MikePaquette This would create a situation where source.domain and destination.domain in almost all flow events would contain IP addresses, however.

I'd much rather introduce this field that's known to be kind of a mess, whose name doesn't mean anything specific (address is more generic than domain), but is always guaranteed to have a value of whatever that endpoint was.

andrewkroh · 2018-12-07T20:34:54Z

I think adding a generic address fields introduces ambiguity and will make it more difficult to query information. The field provides more flexibility when generating data, but the cost is that you need to query more fields to find information (ip, domain, and now address) and altogether lose the ability to do CIDR searches if the value is an IP.

It's a trade-off between doing the work at ingest time to classify the data or pushing that work to query-time.

(I'm considering the case where you have either an IP address or hostname, and ignoring unix sockets. For unix sockets perhaps having separate field distinctly for them would make sense.)

ruflin · 2018-12-10T12:00:46Z

I think this is also about providing a field when we can't do it at ingest time. It could even be a post-processing job to split up the fields but it would still know what default field it should be based on.

For me address is not really the field we should encourage users to query on.

webmat · 2018-12-10T14:15:40Z

@andrewkroh The way I envision this (and perhaps my definition isn't clear enough) this field works exactly the reverse way:

If you have an IP:

.address == the IP
.ip == the IP
.domain == empty (unless you do reverse a DNS query)

If you have a domain:

.address == the domain
.domain == the domain
.ip == empty (unless you do a DNS query)

If you have a socket:

.address == the socket
.domain == empty
.ip == empty

Depending on the event stream being observed, the overwhelming majority of events will fall cleanly within expectations, using the expected field (e.g. access logs will mostly contain IP addresses). But if you then rely solely on the field you expect*, you are at risk of filtering out the weird events that fall outside of that expectation. Whether it's local access via a unix socket or it's a resolved name in the place of the IP address (httpd's HostnameLookups set to On).

The goal of this field is to allow looking for the long tail of weirdness. To have a place where you know if you look there, you have that endpoint's address, whatever form it took.

Also, the unix socket is so rare that I'd much rather have only .ip, .domain and .address with these semantics, than have .ip, .domain and .socket.

* Unless you ensure to always consider the events with "missing value" as well.

MikePaquette · 2018-12-10T14:47:23Z

@webmat your explanation is helpful, and I am OK to define such fields as extended fields in ECS.

I would prefer a more descriptive/explicit field name that has less potential for confusion for users. For example, might_be_address, ip_or_domain, ip_or_host , ip_or_domain_or_socket

Related optimization question: would there be any advantage for subsequent queries/aggs/viz's if in the "domain" or "socket" cases, ECS converters populated the .ip fields with a constant known value, rather than leaving them unpopulated? (I am not sure how the IP datatype works when the field is empty.) ? e.g., 0.0.0.0 ?

ruflin · 2018-12-10T15:06:23Z

+1 on using .address and put it into extended. I think we really need to field to make ingestion easy and so far it's the best name we came up with.

webmat · 2018-12-10T15:25:03Z

@MikePaquette For missing values, leaving the fields absent is the simplest option, and the one I think we should advocate.

For the name of the field, I do think .address makes most sense. It's simple, and it can also become a new pattern: we've been discussing using service.address to put contain addresses such as ODBC-style DSN strings. Address contains the whole thing as observed (well, with password anonymized) and then if your pipeline extracts domain/IP/port/user out of that, all of this goes to the expected fields. But .address is where the full raw value goes.

webmat · 2018-12-10T15:25:46Z

I meant to paste the Beats PR in my comment above: elastic/beats#8941

ruflin

LGTM besides the changelog.

ruflin · 2018-12-11T08:35:06Z

CHANGELOG.md

@@ -38,6 +38,7 @@ All notable changes to this project will be documented in this file based on the
 * Reintroduce a streamlined `user_agent` field set. #240
 * Add `geo.name` for ad hoc location names. #248
 * Add `event.timezone` to allow for proper interpretation of incomplete timestamps. #258
+* Add fields `source.address` and `destination.address`. #247


Is is also added to client and server it seems.

Nice catch, will fix

MikePaquette

Documentation helps clarify purpose nicely.

webmat self-assigned this Dec 6, 2018

webmat added the 1.0.0-beta2 label Dec 6, 2018

webmat requested review from ruflin and MikePaquette December 6, 2018 21:10

MikePaquette reviewed Dec 6, 2018

View reviewed changes

ruflin reviewed Dec 7, 2018

View reviewed changes

webmat force-pushed the address branch from 8b32ba0 to fdd74e2 Compare December 7, 2018 15:53

Mathieu Martin added 3 commits December 7, 2018 12:22

Introduce source.address and destination.address.

2ef3b95

Changelog

9b2c7a0

Add the address field to client and server as well.

7f65b2f

webmat force-pushed the address branch from fdd74e2 to 7f65b2f Compare December 7, 2018 17:26

Fix 'conjuction' typo in client as well

e4e568a

ruflin approved these changes Dec 11, 2018

View reviewed changes

ruflin mentioned this pull request Dec 11, 2018

Parse also the port from log sources elastic/beats#9460

Closed

MikePaquette approved these changes Dec 11, 2018

View reviewed changes

Update changelog to also list cli/srv

8aa12a9

webmat merged commit 930bb23 into elastic:master Dec 11, 2018

webmat deleted the address branch December 11, 2018 13:43

narph mentioned this pull request Jan 24, 2023

Support the ingest of the source address and the source port separately elastic/beats#34371

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `source.address` and `destination.address`. #247

Introduce `source.address` and `destination.address`. #247

webmat commented Dec 6, 2018

MikePaquette left a comment

ruflin left a comment

webmat commented Dec 7, 2018

andrewkroh commented Dec 7, 2018

ruflin commented Dec 10, 2018

webmat commented Dec 10, 2018

MikePaquette commented Dec 10, 2018

ruflin commented Dec 10, 2018

webmat commented Dec 10, 2018

webmat commented Dec 10, 2018 •

edited

Loading

ruflin left a comment

ruflin Dec 11, 2018

webmat Dec 11, 2018

MikePaquette left a comment

Introduce source.address and destination.address. #247

Introduce source.address and destination.address. #247

Conversation

webmat commented Dec 6, 2018

MikePaquette left a comment

Choose a reason for hiding this comment

ruflin left a comment

Choose a reason for hiding this comment

webmat commented Dec 7, 2018

andrewkroh commented Dec 7, 2018

ruflin commented Dec 10, 2018

webmat commented Dec 10, 2018

MikePaquette commented Dec 10, 2018

ruflin commented Dec 10, 2018

webmat commented Dec 10, 2018

webmat commented Dec 10, 2018 • edited Loading

ruflin left a comment

Choose a reason for hiding this comment

ruflin Dec 11, 2018

Choose a reason for hiding this comment

webmat Dec 11, 2018

Choose a reason for hiding this comment

MikePaquette left a comment

Choose a reason for hiding this comment

Introduce `source.address` and `destination.address`. #247

Introduce `source.address` and `destination.address`. #247

webmat commented Dec 10, 2018 •

edited

Loading