Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geo-Match Enrich Processor #47243

Merged
merged 9 commits into from
Oct 7, 2019
Merged

Geo-Match Enrich Processor #47243

merged 9 commits into from
Oct 7, 2019

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Sep 27, 2019

Geo-Match Enrich Processor

this commit introduces a geo-match enrich processor that looks up a specific
geo_point field in the enrich-index for all entries that have a geo_shape match field
that meets some specific relation criteria with the input field.

For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.

this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.

Closes #42639.

this commit introduces a geo-match enrich processor that looks up a specific
`geo_point` field in the enrich-index for all entries that have a geo_shape match field
that meets some specific relation criteria with the input field.

For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.

this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.

Closes elastic#42639.
@talevy talevy added :Analytics/Geo Indexing, search aggregations of geo points and shapes :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Sep 27, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@talevy
Copy link
Contributor Author

talevy commented Sep 27, 2019

Note to review:

  • this still need more tests
  • I was not sure how to extend EnrichPolicy to support passing in the shape_relation for the geo-enrich-processor without conflating the policy with things unrelated to Match

Please let me know if this looks reasonable. If so, then I will continue to clean it up further, otherwise I will await further instruction to refactor!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @talevy! The current abstraction looks good. I left a few comments.

I was not sure how to extend EnrichPolicy to support passing in the shape_relation for the geo-enrich-processor without conflating the policy with things unrelated to Match

I think shape_relation should be a pipeline configuration option and not a policy configuration, because this setting doesn't affect the enrich index being created and it is merely a query parameter.

// No need to also configure index_options, because keyword type defaults to 'docs'.
} else if (EnrichPolicy.GEO_MATCH_TYPE.equals(policy.getType())) {
matchFieldMapping = (builder) -> builder.field("type", "geo_shape");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that geo_shape field type uses doc values by default, right?
Do you know whether the default is going to change in the future?
I just want to make sure we never store more than what is needed for enrich. (since doc values are not being used for querying)

Copy link
Contributor Author

@talevy talevy Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that since doc-values aren't supported, neither is doc-values: false. I was debating whether this should be fixed upstream so that one is allowed to set that. Currently, it errors out if you attempt to set it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow-up in another PR to expose doc-values parsing to geo_shape. I think it should understand that parameter, even though it only allows it to be false. For now, I think this is the best that can be done. Thankfully, I know when doc-values on shapes will be supported so I will be sure to update this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That sounds good to me.

@talevy talevy requested a review from martijnvg October 1, 2019 13:57
@talevy
Copy link
Contributor Author

talevy commented Oct 2, 2019

run elasticsearch-ci/bwc
run elasticsearch-ci/default-distro

@talevy
Copy link
Contributor Author

talevy commented Oct 3, 2019

@elasticmachine update branch

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

assertThat(statsResponse.getCoordinatorStats().size(), equalTo(1));
String localNodeId = getInstanceFromNode(ClusterService.class).localNode().getId();
assertThat(statsResponse.getCoordinatorStats().get(0).getNodeId(), equalTo(localNodeId));
assertThat(statsResponse.getCoordinatorStats().get(0).getRemoteRequestsTotal(), greaterThanOrEqualTo(1L));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be equalTo(1), since only 1 index request is ingested.
This test suite also restarts the node between each test, so this should be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch. didn't update this.

@talevy
Copy link
Contributor Author

talevy commented Oct 3, 2019

I've only added support for geo_point for now. after merging, I will open an issue for supporting geo_shape. The problem is that parsing either geo_point or geo_shape gets really confusing since geo_point supports so many different field value types. I didn't want to hold up this PR because of that

@talevy
Copy link
Contributor Author

talevy commented Oct 4, 2019

run elasticsearch-ci/2
run elasticsearch-ci/bwc
run elasticsearch-ci/default-distro

@talevy talevy merged commit c20fa4d into elastic:enrich Oct 7, 2019
@talevy talevy deleted the enrich-geo branch October 7, 2019 18:24
talevy added a commit to talevy/elasticsearch that referenced this pull request Oct 7, 2019
this commit introduces a geo-match enrich processor that looks up a specific
`geo_point` field in the enrich-index for all entries that have a geo_shape match field
that meets some specific relation criteria with the input field.

For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.

this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.

Closes elastic#42639.
talevy added a commit that referenced this pull request Oct 7, 2019
this commit introduces a geo-match enrich processor that looks up a specific
`geo_point` field in the enrich-index for all entries that have a geo_shape match field
that meets some specific relation criteria with the input field.

For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.

this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.

Closes #42639.
@polyfractal
Copy link
Contributor

@talevy This merged into enrich, and that branch merged into master/7.5 right? Should we label this ticket with versions too so it's clear this landed as well?

@martijnvg
Copy link
Member

@polyfractal I think the fact that #48039 is versioned, is sufficient? Also PRs on feature branches are typically not versioned and labelled non-issue in order to avoid confusion about features that have not been released yet.

@polyfractal
Copy link
Contributor

@martijnvg 👍 makes sense, that works for me. Was just trying to hunt down if this made it as part of the enrich feature or not, since it was also being tracked semi-independent of the enrich feature.

I think I was also a bit confused on what it did too... thought it was a separate processor, rather than being a policy of enrich. So that makes more sense now too :)

@talevy
Copy link
Contributor Author

talevy commented Oct 16, 2019

what Martijn said :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants