Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest node: IP Address Processor #38064

Open
jakelandis opened this issue Jan 31, 2019 · 6 comments
Open

Ingest node: IP Address Processor #38064

jakelandis opened this issue Jan 31, 2019 · 6 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature Team:Data Management Meta label for data/management team

Comments

@jakelandis
Copy link
Contributor

There is a need to properly handle IPv6 zone_id's , see #37107.

The IP data type only allows 128 bits to be indexed and an IPv6 address with a zone_id fails will fail to parse as an IP. Elasticsearch can't simply ignore the zone_id from indexing since that would silently change the fidelity of that data and there is no desire to support zone_id's at a low level.

A current solution is to use Grok to split the address and zone_id, which works, but can be cumbersome to implement.

For these reasons, I propose an IP Address Processor for the ingest node.

The IP Address Processor will be able to

  • split the IPv6 into it's 128bit address and it's zone_id.

It may also be

  • categorize an IP as either IPv4 or IPv6.
  • categorize an IPv4 address class (A -> E).
  • categorize an IPv6 type (Unicast, AnyCast, MultiCast, loopback, or unspecified )
  • extract an IPv4 that is encoded inside an IPv6.

IPv6 zone_id : https://tools.ietf.org/html/rfc4007
IPv6 address: https://tools.ietf.org/html/rfc3513

@jakelandis jakelandis added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Jan 31, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jakelandis
Copy link
Contributor Author

related: #36145

@probakowski
Copy link
Contributor

I've started working on it, my initial proposal for config looks like:

{
  "field": "ip", /* can be string or array of strings - see #49573 */
  "target_field": "target", /* ip without zone id, optional, defaults to "field" value */
  "zone_id_field": "zone_id", /* stores zone id if present, optional */
  "version_field": "ip_version", /* version of IP address, either 4 or 6, optional */
  "class_field": "ipv4_class", /* class of IPv4 address, A to E, optional */
  "type_field": "ip_type", /* type of address, one of: [loopback, linklocal, unicast, multicast], optional */
}

I've also thought about adding matching network by CIDR, something like:

  "network": {
     "field": "network_name",
     "networks": {
         "192.168.0.0/16": "netA",
         "128.1.1.0/24": "netB"
      }
  }

but I think this type of tasks is better suited for enrich processor.

Processor will work with single ip address as well as list to be in line with #49573 . In that case it will return list in every field filled with nulls where it can't compute values and skipped if there are only nulls in list.

@probakowski probakowski self-assigned this Dec 11, 2019
@jakelandis
Copy link
Contributor Author

Can I suggest the config to be

{
  "field": "ip", /* can be string or array of strings - see #49573 */
  "target_field": "target", /* the object that contains the information about the ip address, such as ip version (4 or 6), ip class (A to E), ip type (e.g. loopback, unicast, etc.), and ipv6 zone id,  */
}

so for a config of

{ 
  "field" : "myip",
  "target_field" : "mytarget"
}

an input document of

{ "myip" : "::1%0" }

results in a document like:

{
   "myip":"::1%0",
   "mytarget":{
      "ip":"::1",
      "ip_zone":"0",
      "ip_version":6,
      "ip_class":"?? is this a thing with ipv6 ?? , if not don't add",
      "ip_type":"loopback"
   }
}

We will want to work with @webmat on the names and types, since i would expect that "mytarget" would really be something like the ecs host. We would also want to default to not override the existing target.

@jakelandis
Copy link
Contributor Author

If enriching/fixing incoming ECS data is a usecase we should consider allowing

{ 
  "field" : "host.ip",
  "target_field" : "host"
}

to also work, with an option (default to true) to replace the ip field.

e.g.

{
   "host":{
      "ip":"::1%0"
   }
}

results in

{
   "host":{
      "ip":"::1",
      "ip_zone":"0",
      "ip_version":6,
      "ip_class":"?? is this a thing with ipv6 ?? , if not don't add",
      "ip_type":"loopback"
   }
}

(note ::1%0 is no more)

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@phiz71
Copy link

phiz71 commented Aug 11, 2021

Hello @jakelandis, do you have some news for this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >feature Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

6 participants