Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GeoJSON for geo_point #85120

Merged
merged 8 commits into from
Mar 23, 2022

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Mar 18, 2022

Support GeoJSON for points when the mapper specifies geo_point.

Closes #47815
Closes #84599

@craigtaverner craigtaverner added the :Analytics/Geo Indexing, search aggregations of geo points and shapes label Mar 18, 2022
subParser.nextToken();
if (subParser.currentToken() == Token.START_ARRAY) {
coordinates = new ArrayList<>();
subParser.nextToken();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should stricter here. We expect a two / three dimensional double array. WE should only accept that else throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do that on line 527. I noticed that the code here from before does a little validation during the parsing and then more validation on the results further down. I think we could do even more validation, like enforcing that you do not mix and match multiple types of data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a lot more validation now

@@ -492,6 +519,14 @@ public static GeoPoint parseGeoPoint(XContentParser parser, GeoPoint point, fina
} else {
return point.parseGeoHash(geohash, effectivePoint);
}
} else if (coordinates != null) {
if (geojsonType == null || geojsonType.toLowerCase(Locale.ROOT).equals("point") == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check that we have no other elements in GeoJson (lat, lon or geohash)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've generalized the validation now to cover all types.

}
}

public void testQueryPointFromMultiPoint() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a duplicate of the test testQueryPointFromMultiPointFormats but with the number of supported formats reduced to those supported by the GeoJSON parser (so removing point as double[] and lat,lon string). This is because the current implementation added point geojson to the point parser, instead of adding point parsing to the geojson parser. The question is whether we should do it the other way round. The consequence of adding point support to the GeoJSON parser is that geo_shape mappers will start understanding the alternative versions of point (double[] and lat,lon string). Is that a good or a bad consequence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generalized the test, keeping only test data separate (so geo_point tests all four formats, while geo_shape tests only WKT and GeoJSON).

NumberFormatException numberFormatException = null;
String geojsonType = null;
ArrayList<Double> coordinates = null;
class NumberFormatExceptionHandler {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class exists only to keep the old behaviour that NumberFormatException is only handled at the end of parsing. However, there is no obvious reason why we would need that. It seems to me perfectly sufficient to throw the exception at the point that NumberFormatException arises, right? We have a number of other exceptions being thrown immediately during parsing, so why should this particular exception be treated differently. If we allow it to be thrown during parsing, this inner class can get replaced by a simple utility method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On theory I have is that it could be that when throwing exceptions during parsing, it is impossible to continue parsing (because Objects are not closed, or fully parsed), so the entire import (or entire query) is failed. This can be fixed by catching exceptions, finishing parsing the inner object and then re-throw the exception (which is what we see happening with NumberFormatException) and then the outer code can decide whether to catch and continue, or re-throw. I remember seeing some comment somewhere about this case. So perhaps NumberFormatException is considered a case we want to just log and continue, while other exceptions are considered serious enough to break the entire import (or query)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, after discussions with @iverase we decided to remove this. The work done in #40447 should cover the requirements to finish parsing objects, so we don't need to capture exceptions anymore.

@craigtaverner craigtaverner marked this pull request as ready for review March 22, 2022 10:52
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 22, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a small nit, otherwise LGTM

@craigtaverner
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

@craigtaverner
Copy link
Contributor Author

This work was enhanced further in #85442

@craigtaverner craigtaverner deleted the geojson_for_geo_point branch October 20, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support GeoJson on geo_point fields Add GeoJSON Point format support for geo_point field value formats
4 participants