Support for DynamicTemplates. #2969

youssef3wi · 2024-08-20T14:58:06Z

You have read the Spring Data contribution guidelines.
There is a ticket in the bug tracker for the project in our issue tracker. Add the issue number to the Closes #issue-number line below
You use the code formatters provided here and have them applied to your changes. Don’t submit any formatting related changes.
You submit test cases (unit or integration tests) that back your changes.
You added yourself as author in the headers of the classes you touched. Amend the date range in the Apache license header if needed. For new types, add the license header (copy from another file and set the current year only).

Closes #2774

I'm uncertain if this will be merged, so the best approach is to use converters.

sothawo · 2024-08-24T17:00:33Z

Thanks for that PR, I finally had some time to have a look.

Restrictions I see in this current draft:

This currently only is for properties of type Map<String, Object> right?
Only match from the dynamic mapping is supported, we should add unmatch as well
Supporting path_match and path_unmatch will get pretty ugly, as the mapping from Java property paths to elasticsearch fields in dot notation is hard, we have that in some places, I think that might not be the effort to have it in here
One thing that is missing is converting the values. The dynamic mapping could for example define a date format and then we need to do a conversion on writing and reading. Or the user might have a custom conversion for all string type or something like that. We'd need to a apply this to these properties as well.
the reading part in the readEntity method should go into its own function, readEntity gets too long.

Besides that this would need some documentation not only in the javadoc, but in the reference documentation as well.

youssef3wi · 2024-08-24T17:39:08Z

Honestly I believe it's not good to support this or even improving this draft

sothawo · 2024-08-25T07:05:41Z

Another thought that I just had: the dynamic templates definition must not necessarily come from a @DynamicTemplates annotation on the Java entity. It could have been written to the index mapping from outside ot could be contained in a json that is referred to with the @Mappings annotation. So it would be necessary to retrieve the current mappping from the cluster to find out the relevant settings here. This would have to be done once and not on every write/read and stored somehow on the entity - doing stuff like this in the ractive world easily gets nasty..

Seems quite some effort to add this; if you don't want to go on with this, just drop it.

youssef3wi · 2024-08-27T14:05:20Z

@sothawo I'm not convinced by this draft at the moment.

First, the metadata from the cluster is not in the correct location. I just placed it there for testing purposes, but I haven't found a secure place to retrieve the metadata from the cluster.

We should focus on the types and the conversion after that.

sothawo

The more I think about this the more problems seem to arise. We can define on a property that it is part of a dynamic template. But we do not know in which index this entity might be persisted to or retrieved from. So we'd need to load the dynamic template mapping from the cluster for every index name we encounter.
In order to not do this with every save or load, this information would need to be cached. Even if we have this in a cache this add aditional calls even for applications that never need or use this.
For applications that use for example a date in the index name this would mean that over the time we are building up cache data for multiple index names that contain the same data.

I don't think that this architectural and runtime overhead is justified by the single use case - that has only come up ince in all these years that I am maintained SDE

sothawo · 2024-08-30T04:33:52Z

src/main/java/org/springframework/data/elasticsearch/core/AbstractElasticsearchTemplate.java

@@ -602,6 +603,15 @@ protected <T> SearchDocumentResponse.EntityCreator<T> getEntityCreator(ReadDocum
 	// region Entity callbacks
 	protected <T> T maybeCallbackBeforeConvert(T entity, IndexCoordinates index) {

+		// get entity metadata


This would be called for every conversion of every entity, so that if the user saves 100 entities we have 100 calls.

Yes, until I find a better place to cache/load metadata 😄

sothawo

There are too many things that won't work (efficiently):

the user might not have the access rights to read the mappings of all indices in a cluster
even if so, this only gets the indices from the cluster at one point in time. While the application is running, additional indices can get created in in the cluster, either from outside by some other tool or by some naming, rolling pattern from Elasticsearch defined by index templates. Or the user can have SpEL in the indexname like I described in this post

Consider the following case:
The indexname is defined with something like @Document(indexName = "log-#{T(java.time.LocalDate).now().toString()}") combined with in index template to define the mapping. This will result in indices named log-2024-09-4, log-2024-9-5 etc be defined all while the application is running. We would constantly need to reload and adjust the collection of index mapping information.

When searching the user might search the index log-*, we'd need to check every returned search document to find from which index it came and would need to then eventually load the index mapping information.

I don't think that we should add this runtime, network traffic and memory overhead for this single mapping usecase

sothawo · 2024-09-04T17:19:34Z

src/main/java/org/springframework/data/elasticsearch/client/elc/IndicesTemplate.java

@@ -241,6 +248,45 @@ public Map<String, Object> getMapping() {
 		return responseConverter.indicesGetMapping(getMappingResponse, indexCoordinates);
 	}

+	@Override
+	public ClusterMapping getClusterMapping() {
+		GetMappingRequest getMappingRequest = requestConverter.indicesGetMappingRequest(IndexCoordinates.of("*"));


the user might not have the permission to read that information from all indices in a cluster

src/main/java/org/springframework/data/elasticsearch/client/elc/IndicesTemplate.java

youssef3wi · 2024-09-04T18:40:15Z

@sothawo

We can only identify current classes that use Document as an annotation by initializing the repository (a Spring application) or by using reflection. I tried the latter approach, but it is not effective.
For getting all mappings, the goal is to reduce the number of calls per index.

Signed-off-by: Youssef Aouichaoui <[email protected]>

sothawo · 2024-09-06T12:21:11Z

We cannot guarantee to have the information/mapping of an index when an entity is written. The index name that the document is written to might not exist yet, it might get created by Elasticsearch when the document comes in and the mapping can then come from an index template, but there would be no way for Spring Data Elasticsearch to convert to entity to a document using that mapping.

Neither can we guarantee to have this information when reading documents from Elasticsearch. We can try to get the information for all indices (assuming we have the access rights to do so), but there can be indices created from outside the application in between the time this information is retrieved and a search result needs converting. Users read indices with Spring Data Elasticsearch that is created for example by log ingestion frameworks.
Even if we can notice in the process of mapping a document to an entity that this document comes from an index where we have no information about, it's not possible to get the information at that time. This would mean a call to the cluster from within the mapping process. At this place we have no access to any (Reactive)ElasticsearchOperations instance and we shouldn't. And even if we would add such a dependency, we would then need to provide a new reactive Mapper for that.

I don't see that we can provide a reliable and performant solution here by using dynamic templates to solve the problem from #2774

youssef3wi · 2024-09-09T12:22:41Z

I don't see any issues here.
This draft can resolve the problem, but it remains a draft due to the types of conversions.

Are you suggesting closing this draft?

sothawo · 2024-09-10T10:26:38Z

I still see the issues already mentioned:

public ClusterMapping getClusterMapping() {
		GetMappingRequest getMappingRequest = requestConverter.indicesGetMappingRequest(IndexCoordinates.of("*"));

This call might fail as the user might not have access to all indices in a cluster, for example when the cluster serves multiple users/tenants, when kibana is installed which stores data in it's own indices etc. So we cannot guarantee to have the information about an index that we need.

We need to have the index mapping information when converting an entity to a document. But it can be that the index where the document is to be stored does not exist yet. Assume we have an index template for index names like log-* which contains a mapping with a dynamic template, and the user saves some data to log-2024-09-10 and this index does not exist yet. We cannot do the mapping resepcting the dynamic template because Elasticsearch will only create that index when the document is sent to the cluster, but that's to0 late for our conversion.

We do not necessarily have the information about an index when reading response data. Let's assume we start the application and read all the index information that we can and that does not fail. To stick with the example we get the index mappings for all the log-* indices until the current date like log-2024-09-10. On the next day, the user searches for some data in log-*. Now it can be that some log ingestion mechanism has inserted something into log-2024-09-11 before the search started. This index now exists, but our application does not know about this. We then could get documents from log-2024-09-10 that we can convert and some from log-2024-09-11 where we don't have the index mapping in the cache.

As long as these issues are not resolved we cannot reliably offer this as a solution

Signed-off-by: Youssef Aouichaoui <[email protected]>

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Aug 20, 2024

youssef3wi force-pushed the gh-2774 branch from f052ae2 to 5a582ac Compare August 27, 2024 14:04

sothawo reviewed Aug 30, 2024

View reviewed changes

youssef3wi force-pushed the gh-2774 branch 3 times, most recently from c0fa91c to d8fec9a Compare September 3, 2024 08:53

sothawo reviewed Sep 4, 2024

View reviewed changes

youssef3wi added 7 commits September 6, 2024 13:22

Support serialization for DynamicTemplates.

cc809c7

Signed-off-by: Youssef Aouichaoui <[email protected]>

Support deserialization for DynamicTemplates.

4805a46

Signed-off-by: Youssef Aouichaoui <[email protected]>

Extract method to populate fields [dynamic].

cdf3fd2

Signed-off-by: Youssef Aouichaoui <[email protected]>

Get templates from cluster.

9226ab7

Signed-off-by: Youssef Aouichaoui <[email protected]>

Support custom classes for dynamic templates.

ce3a615

Signed-off-by: Youssef Aouichaoui <[email protected]>

Refresh entities mapping cache.

9b66c69

Signed-off-by: Youssef Aouichaoui <[email protected]>

Polishing.

e38fbb3

Signed-off-by: Youssef Aouichaoui <[email protected]>

youssef3wi force-pushed the gh-2774 branch from d8fec9a to e38fbb3 Compare September 6, 2024 11:22

Get only the current index mapping.

9ac79d1

Signed-off-by: Youssef Aouichaoui <[email protected]>

mp911de force-pushed the main branch from ad1a6c3 to 98716a8 Compare October 9, 2024 07:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for DynamicTemplates. #2969

Support for DynamicTemplates. #2969

youssef3wi commented Aug 20, 2024

sothawo commented Aug 24, 2024

youssef3wi commented Aug 24, 2024

sothawo commented Aug 25, 2024

youssef3wi commented Aug 27, 2024 •

edited

Loading

sothawo left a comment

sothawo Aug 30, 2024

youssef3wi Aug 30, 2024

sothawo left a comment

sothawo Sep 4, 2024

youssef3wi commented Sep 4, 2024

sothawo commented Sep 6, 2024

youssef3wi commented Sep 9, 2024

sothawo commented Sep 10, 2024

Support for DynamicTemplates. #2969

Are you sure you want to change the base?

Support for DynamicTemplates. #2969

Conversation

youssef3wi commented Aug 20, 2024

sothawo commented Aug 24, 2024

youssef3wi commented Aug 24, 2024

sothawo commented Aug 25, 2024

youssef3wi commented Aug 27, 2024 • edited Loading

sothawo left a comment

Choose a reason for hiding this comment

sothawo Aug 30, 2024

Choose a reason for hiding this comment

youssef3wi Aug 30, 2024

Choose a reason for hiding this comment

sothawo left a comment

Choose a reason for hiding this comment

sothawo Sep 4, 2024

Choose a reason for hiding this comment

youssef3wi commented Sep 4, 2024

sothawo commented Sep 6, 2024

youssef3wi commented Sep 9, 2024

sothawo commented Sep 10, 2024

youssef3wi commented Aug 27, 2024 •

edited

Loading