-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal]: Low Level Implementation Details for Geo Shape Aggregation and future Geo related Features #92
Comments
Some quick initial feedback:
I think we can start by doing this in the existing FiedData package? Longer term I'd like to separate all of the core geo logic into a core geo module; but that will likely require pulling some of the aggregation framework into a separate module? (more research needed here)
I think we should go with Alternative 1 and keep the current aggregations and |
Hi @nknize
When you that introduce the shape related counterparts in Geo plugin you mean Geo-Spatial repo right? Is this understanding correct? |
Yes |
Then I guess you mean the alternative 2 not the alternative 1 in your older response. Because in alternative 1 we are keeping everything(new aggregations on the Geo_Shape) in the core itself. Please let me know if I am missing something. |
Hi,
Once we have all the aggregation in the GeoModule for the GeoShape and GeoPoint we will evaluate further on the migration plan to move the aggregations from Geo Module to the GeoSpatial repo. This will be the phase-2. |
Updating the proposal point number 2 on top. Pros:
|
Introduction
This proposal talks about the low level implementation details for the Geo Shape aggregations. We will be covering what are the components/classes/interfaces required for the Geo Shape aggregations and what should be the right place to put those components.
In whole proposal we will be using components, classes, interfaces terms interchangeably. As we will be focusing on the low level implementation details it is expected that reader is already aware that how indexing and query works on GeoShape in OpenSearch, what is a doc value and how it is used in the aggregation framework.
Goal
Goal of proposal is to answer the following questions.
Note: Module here refer to the different libs, modules folders provided in the OpenSearch core repo.
P0
P1
#1
to the module we decide to move?Assumptions
Why we need to answer the questions above?
The answer to these questions will help us do the following:
Current Architecture
As of version 2.1 of OpenSearch the Geo related aggregations(which contains only GeoPoint aggregations) are present in the OpenSearch core repo. There are no aggregations present on the Geo Shape. The Field mapper which init the GeoShape mapper is initialized via the Geo module present in the OpenSearch repo.
We have created a new repo named Geo-Spatial which host GeoJson related features. The XY Point and XY shape indexing and querying functionalities are getting developed and will be released as part of 2.3 release(tentative). The intent of that repo was to provide and encapsulate all the geo-related features via that
New Aggregations which we are building
In the current roadmap we have 4 different aggregations that we will be implementing on the Geo Shape. Below are those aggregations.
To provide more context these aggregations are already implemented for the GeoPoint field. We are also working on planning on implementing the Geo Line Aggregation, GeoHex Gird(customer has already reached out for this) which are already supported in Elastic Search.
Proposed Solution
The solution is to move the Aggregations, Indexing and Query for both Geo Point and GeoShape to the Geo-Spatial plugin provided via Geo-Spatial Repo. As this is breaking changes, hence we are proposing a phased approach, which make sure that we are incrementally progressing to the final goal.
Phase 1
Step1: One by one move aggregation(4 different classes for aggregations + Tests) to the Geo modules folder in the OpenSearch core. The aggregation that we will move will be the one which we are developing over the GeoShape.
Why we need this? Or what is the Pros about it?
The code between the aggregations for GeoPoint and GeoShape is same and we can abstract common logic between GeoPoint and GeoShape rather than duplicating them.
Will there be customer Impact?
No, there will be no customer impact, as these classes are directly build with the OpenSearch core min distribution.
What will happen if we don’t do this?
We either need to copy the same code/logic from GeoPoint Aggregations to GeoShape aggregations or we will be creating more interfaces in the server folder of OpenSearch which will needs to be migrated later on.
Step2: Abstract out the common logic for the GeoShape and GeoPoint aggregation. Allow both the aggregation to depend on this logic.
Why we need this? Or what is the Pros about it?
This will abstract out the common logic and we can reuse the logic across aggregations
Will there be customer Impact?
No, as the aggregations will still be built in min distribution of OpenSearch.
Repeat the above 2 steps for each Aggregation that we are launching for GeoShape. This will make sure that we are moving towards right abstraction.
Step3: Move the FieldMappers, Queries and ValueSource related code(Both GeoShape and GeoPoint), Integration Tests etc. to the Geo module once all the Aggregations have moved to the Geo Module.
Why we need this? Or what is the Pros about it?
This will abstract out the common logic and we can reuse the logic across aggregations
Will there be customer Impact?
No, as the aggregations will still be built in min distribution of OpenSearch.
Phase 2
Step1: Build the proposal to move the Geo Module to Geo-Spatial Repo as a separate plugin or under a GeoSpatial Plugin. Analyze the impact and community feedback around what is the best way to provide the backward compatibly and seamless migration.
Why we need this? or what we want to achieve on this?
This will provide us the insights around what community wants and what extra support will be required from our side for the seamless upgrade.
Step2: Move the actual code from the Geo Module to Geo-Spatial Repo and release it via OpenSearch major version(exact version depends on various factors) release cycle as a breaking change.
On a high level this will be a simple movement of code from Geo Module to Geo-Spatial plugin. The complexity will arise around how customer will migrate. Given that this will be done in major release of OpenSearch(given that it is breaking change), we will use the feedbacks provided in step-1 to give customer all the tools for doing backward compatibility. Example:
Below are the pros and cons of the above solution:
Pros:
Cons:
Alternatives
Alternative 1
Provide Aggregations on GeoShape as a part of Open search min distribution by directly implementing the GeoShape aggregations in the OpenSearch core itself.(Don’t move aggregations to even geo module in the core repo).
Pros:
Cons:
The cons provided here can be removed working with the OpenSearch team to discuss on the dedicated reviewers and bandwidth for PRs. Also, as we move towards moving the code to geo-module we will have zero to no conflicts as apart from geo-spatial team no one will be working on that in best case. On con #3 as of now we don’t have any plan and customer use-case. Even if we do have that we can still use the common interfaces that we have created in the Geo module.
Overall the solution do have some cons which can create some delays and delayed launches but this is the most customer obsessed way.
Alternative 2
Create the new aggregations for GeoShape directly in the Geo Spatial repo, do all the refactoring required in core OpenSearch repo to avoid code duplication. As a backlog task move the Aggregations(Geo Point) to new repo and keep adding new features on Geo to the Geo-Spatial repo only.
Pros:
Cons:
Alternative 3
Create the new aggregations for GeoShape directly in the Geo Spatial repo, do all the refactoring required in core OpenSearch repo to avoid code duplication. Keep adding new features on the Geo Spatial repo only.
Pros:
Cons:
Feedback Required
The above sections provide Phased Approach as the proposed solution, but we really want to focus on the Alternative-2 as well. It would be great to know community thinks about the Alternative-2 and should we do that instead of phased approach which is long and time consuming.
Appendix
Frequently Asked Question
Q: If after step 1 of phase 2 we found out that community doesn’t want yet another plugin for Geo related indexing and querying then will phase 1 effort be in vain?
No it won’t be in vain as we would have developed the interfaces that can be used in the XY aggregations, we would have carved out the geo related code to module which can evolve of its own, even when it is with the Core repo.
Useful Links:
The text was updated successfully, but these errors were encountered: