-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for DynamicTemplates. #2969
base: main
Are you sure you want to change the base?
Conversation
Thanks for that PR, I finally had some time to have a look. Restrictions I see in this current draft:
Besides that this would need some documentation not only in the javadoc, but in the reference documentation as well. |
Honestly I believe it's not good to support this or even improving this draft |
Another thought that I just had: the dynamic templates definition must not necessarily come from a Seems quite some effort to add this; if you don't want to go on with this, just drop it. |
@sothawo I'm not convinced by this draft at the moment. First, the metadata from the cluster is not in the correct location. I just placed it there for testing purposes, but I haven't found a secure place to retrieve the metadata from the cluster. We should focus on the types and the conversion after that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about this the more problems seem to arise. We can define on a property that it is part of a dynamic template. But we do not know in which index this entity might be persisted to or retrieved from. So we'd need to load the dynamic template mapping from the cluster for every index name we encounter.
In order to not do this with every save or load, this information would need to be cached. Even if we have this in a cache this add aditional calls even for applications that never need or use this.
For applications that use for example a date in the index name this would mean that over the time we are building up cache data for multiple index names that contain the same data.
I don't think that this architectural and runtime overhead is justified by the single use case - that has only come up ince in all these years that I am maintained SDE
@@ -602,6 +603,15 @@ protected <T> SearchDocumentResponse.EntityCreator<T> getEntityCreator(ReadDocum | |||
// region Entity callbacks | |||
protected <T> T maybeCallbackBeforeConvert(T entity, IndexCoordinates index) { | |||
|
|||
// get entity metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be called for every conversion of every entity, so that if the user saves 100 entities we have 100 calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, until I find a better place to cache/load metadata 😄
c0fa91c
to
d8fec9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many things that won't work (efficiently):
- the user might not have the access rights to read the mappings of all indices in a cluster
- even if so, this only gets the indices from the cluster at one point in time. While the application is running, additional indices can get created in in the cluster, either from outside by some other tool or by some naming, rolling pattern from Elasticsearch defined by index templates. Or the user can have SpEL in the indexname like I described in this post
Consider the following case:
The indexname is defined with something like @Document(indexName = "log-#{T(java.time.LocalDate).now().toString()}")
combined with in index template to define the mapping. This will result in indices named log-2024-09-4, log-2024-9-5 etc be defined all while the application is running. We would constantly need to reload and adjust the collection of index mapping information.
When searching the user might search the index log-*, we'd need to check every returned search document to find from which index it came and would need to then eventually load the index mapping information.
I don't think that we should add this runtime, network traffic and memory overhead for this single mapping usecase
@@ -241,6 +248,45 @@ public Map<String, Object> getMapping() { | |||
return responseConverter.indicesGetMapping(getMappingResponse, indexCoordinates); | |||
} | |||
|
|||
@Override | |||
public ClusterMapping getClusterMapping() { | |||
GetMappingRequest getMappingRequest = requestConverter.indicesGetMappingRequest(IndexCoordinates.of("*")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the user might not have the permission to read that information from all indices in a cluster
src/main/java/org/springframework/data/elasticsearch/client/elc/IndicesTemplate.java
Outdated
Show resolved
Hide resolved
src/main/java/org/springframework/data/elasticsearch/client/elc/IndicesTemplate.java
Outdated
Show resolved
Hide resolved
|
Signed-off-by: Youssef Aouichaoui <[email protected]>
Signed-off-by: Youssef Aouichaoui <[email protected]>
Signed-off-by: Youssef Aouichaoui <[email protected]>
Signed-off-by: Youssef Aouichaoui <[email protected]>
Signed-off-by: Youssef Aouichaoui <[email protected]>
Signed-off-by: Youssef Aouichaoui <[email protected]>
Signed-off-by: Youssef Aouichaoui <[email protected]>
We cannot guarantee to have the information/mapping of an index when an entity is written. The index name that the document is written to might not exist yet, it might get created by Elasticsearch when the document comes in and the mapping can then come from an index template, but there would be no way for Spring Data Elasticsearch to convert to entity to a document using that mapping. Neither can we guarantee to have this information when reading documents from Elasticsearch. We can try to get the information for all indices (assuming we have the access rights to do so), but there can be indices created from outside the application in between the time this information is retrieved and a search result needs converting. Users read indices with Spring Data Elasticsearch that is created for example by log ingestion frameworks. I don't see that we can provide a reliable and performant solution here by using dynamic templates to solve the problem from #2774 |
I don't see any issues here. Are you suggesting closing this draft? |
I still see the issues already mentioned:
This call might fail as the user might not have access to all indices in a cluster, for example when the cluster serves multiple users/tenants, when kibana is installed which stores data in it's own indices etc. So we cannot guarantee to have the information about an index that we need. We need to have the index mapping information when converting an entity to a document. But it can be that the index where the document is to be stored does not exist yet. Assume we have an index template for index names like log-* which contains a mapping with a dynamic template, and the user saves some data to log-2024-09-10 and this index does not exist yet. We cannot do the mapping resepcting the dynamic template because Elasticsearch will only create that index when the document is sent to the cluster, but that's to0 late for our conversion. We do not necessarily have the information about an index when reading response data. Let's assume we start the application and read all the index information that we can and that does not fail. To stick with the example we get the index mappings for all the log-* indices until the current date like log-2024-09-10. On the next day, the user searches for some data in log-*. Now it can be that some log ingestion mechanism has inserted something into log-2024-09-11 before the search started. This index now exists, but our application does not know about this. We then could get documents from log-2024-09-10 that we can convert and some from log-2024-09-11 where we don't have the index mapping in the cache. As long as these issues are not resolved we cannot reliably offer this as a solution |
Signed-off-by: Youssef Aouichaoui <[email protected]>
Closes #2774