Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregation support for geo_shape fields #50834

Closed
wants to merge 141 commits into from
Closed

Conversation

talevy
Copy link
Contributor

@talevy talevy commented Jan 10, 2020

This PR introduces doc-values support for geo_shape fields.
This includes aggregation support for the following existing
geo_point aggregations:

  • geohash_grid
  • geotile_grid
  • geo_centroid
  • geo_bounds

The geo_distance aggregation is the only one not supported in this PR.
Scripting support is also not implemented.

Closes #37206.

talevy and others added 30 commits May 21, 2019 11:12
This commit introduces a new data-structure
for reading and writing EdgeTrees that write/read
serialized versions of the tree.

This tree is the basis of Polygon trees that will contain representation
of any holes in the more complex polygon
The GeometryTree represent an Elastisearch Geometry
object. This includes collections like MultiPoint
and GeometryCollection.

For the initial implementation, only polygons without
holes are supported.

In a follow-up PR, the GeometryTree will be the object that
interacts with doc-value reading and writing.
- min and max values of coordinates were difficult to
  track, this fixes that by introducing a new Extent
  object
- Instead of re-wrapping ByteRef into a StreamInput, a stream
  input is made once
- a new getExtent() method is introduced for use by aggregations like
  geo_bounds
- re-use bounding-box containment checks
* Add GeometryTree support for point/multipoint

This commit adds support for MultiPoint and Point
shapes to be stored in GeometryTree.

To represent the collection of points, a KDbush is used, which is
a sorted array sorted recursively by alternating dimensions x/y.
This work is inspired by https://github.com/mourner/kdbush

The purpose of this reader is to check whether any subset of the
points in the kd-tree are contained within the bounding-box query.

* unify reader interface and cleanup multipoint usage

* respond to review
The main change here is that edge-trees originally
checked whether the queried extent could be contained
within its shape. Since line-strings have no inner boundaries,
this check is not useful, the line crosses check + extent-check-bounds
is sufficient.
To aid in keeping aggregation logic as simple as possible,
the MultiGeoPointValues object that returns GeoPoint values
for fields from doc-values is updated to return implementations
of a geo-value object that can represent either points or shapes.
talevy and others added 3 commits February 10, 2020 12:49
Lucene removed GeoRelationUtils, and so this commit
inlines ES's usage of this utiity class.
…2020)

* Fix and document tiling semantics for shapes

This commit resolves an issue in the geogrid shape tiler

1. fixes geohash brute-force-tiling to be equivalent to recursive geohash
   tiling
2. Resolves geotile tiling so that shapes outside of the geotile bounds are
   discarded
3. TriangleTree#relate is changed to be a specific relation against tiles
   such that intersections of tiles on the southern and western bounds
   of the shape are counted

* more cleanup

* in silico

* fix a few more edge cases and mute tests for more debugging

- Extent -> BoundingBox had a bug where 180/-180 and 90/-90 were
  treated as infinities.
- awaitfixed a few edge-case tests
- added muted test for checking that tile hashes of points along a
  tile reflect the same tiles returned by the tiler's setValues

* fix checkstyle
talevy and others added 2 commits February 13, 2020 09:31
Due to how geometries are encoded, it is important to compare the
bounds of a shape to that of the encoded latitude bounds for
geo-tiles.
talevy and others added 6 commits February 20, 2020 07:51
This commit reflects comments made by Adrien in #50834
surrounding the Extent serialization.

it re-orders and negates a few values in order to save more space
This commit modifies the centroid-calculator/dimensional-shape-type
to properly support the instances of polygons that have no area
and lines that have no length. Beforehand N/A were returned for the
centroid values, but it is best to downcast the shape type to
the appropriate type.

Closes #52303
This PR adds support for the `doc_values` field mapping parameter.

`true` and `false` supported by the GeoShapeFieldMapper,
only `false` is supported by the LegacyGeoShapeFieldMapper.

relates #37206
talevy added a commit that referenced this pull request Feb 24, 2020
This commit reflects comments made by Adrien in #50834
surrounding the Extent serialization.

it re-orders and negates a few values in order to save more space
talevy and others added 6 commits February 26, 2020 08:58
there are times where small triangle areas within a polygon
have really small areas 1e-11, while the whole polygon's area is
zero. This results in an infinite valuation of the centroid point
representing that triangle. This commit ignores the addition of
such values

Addresses #52774
This PR cleans up some aspects of GeoShapeCellValues
to support the specialization of bounded geo_shape
geo-grid aggregations.

This refactor reverts some of the BoundedCellValues
constructs. Instead, BoundedGeoTileGridTiler and
BoundedGeoHashGridTiler are introduced.

As part of this change, the definition/semantics of
geo_grid aggs with bounds on geo_point are modified
to match the same behavior as geo_shapes, where it is
the tile of the point that must intersect the bounds
in order for the point to be accounted for
@bpintea bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020
@talevy
Copy link
Contributor Author

talevy commented Apr 17, 2020

Closing in favor of individual PRs

@talevy talevy closed this Apr 17, 2020
@talevy talevy deleted the geoshape-doc-values branch May 7, 2020 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Doc values support for geo shapes.
7 participants