Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing metadata information in GetResult #38373

Merged
merged 23 commits into from
May 23, 2019
Merged

Conversation

sandmannn
Copy link
Contributor

This PR adds a new parameter to the constructor of document field isMetadataField which describes whether the DocumentField in question is metadata field or not. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.

@@ -350,15 +352,15 @@ public static GetResult fromXContentEmbedded(XContentParser parser, String index
}
} else if (FIELDS.equals(currentFieldName)) {
while(parser.nextToken() != XContentParser.Token.END_OBJECT) {
DocumentField getField = DocumentField.fromXContent(parser);
DocumentField getField = DocumentField.fromXContent(parser, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst This DocumentField is not inside of _source, but I assumed that it is not a metadata field based on the way the FIELDS component is filled during serialization https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L277-L282

@colings86 colings86 added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Feb 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@mayya-sharipova
Copy link
Contributor

@sandmannn Thank you for your PR. Can you please sing CLA, so after that we can start CI test for this PR.

@sandmannn
Copy link
Contributor Author

Hi, I actually signed it several times, but the check is still red. Should I provide a Transaction ID?

@javanna
Copy link
Member

javanna commented Feb 25, 2019

sorry @sandmannn it looks like we had some technical problems at the time that you signed the CLA, we apologize. Would you mind signing it once again? Hopefully this time it will work.

@sandmannn
Copy link
Contributor Author

Hi @javanna , I just did it again. looks like that check is passed now.

@mayya-sharipova
Copy link
Contributor

@elasticmachine test this please

@elasticcla
Copy link

Hi @sandmannn, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@mayya-sharipova
Copy link
Contributor

@elasticmachine test this please

1 similar comment
@mayya-sharipova
Copy link
Contributor

@elasticmachine test this please

@mayya-sharipova
Copy link
Contributor

@elasticmachine test this please

@@ -47,14 +46,16 @@
public class DocumentField implements Streamable, ToXContentFragment, Iterable<Object> {

private String name;
private Boolean isMetadataField;
Copy link
Contributor

@mayya-sharipova mayya-sharipova Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be final? If you are adding a new member variable, you should modify all corresponding methods to incorporate it, methods such as readFrom, writeTo, equals, hashCode, toString. For readFrom and writeTo the serialization should be a version dependent (an example here)

Copy link
Contributor

@mayya-sharipova mayya-sharipova Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think more how to organize DocumentField class. Usually fromXContent -> toXContent ->fromXContent should produce an equal object (and this is what we test in DocumentFieldTests). The way how you organized a DocumentField class, it doesn't happen as toXContent is using isMetadataField, and fromXContent is not using it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it requires a lot of changes to use isMetadataField in toXContent (we need to investigate this first), then we should document why we are not including it in toXContent, and also exclude this field from equals, hashCode - also documenting why isMetadataField is not participating in these functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Regarding consistency of fromXContent -> toXContent ->fromXContent we had a relevant discussion in the issue description. There was a suggestion of using different json structures depending when serializing from new versions, while still being able to deserialize json from old versions, as suggested in #24422 (comment) . In such case it would have been possible to have enough information in the json to be able to make the conversions consistent without context. Yet there was a strong argument to avoid as mentioned here #24422 (comment)
as changing such json structure may result in issues when serializing it in one version and parsing in another.

In short, looks like it is not a lot of changes, but a bwc concern that prevents us from storing the isMetadataField in toXContent. It would be great if you share your thoughts on the matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we need to make a trade-off between a)extending serialized json schema for document field, additional coding to guarantee backward compatibility for serialized content and b)guessing the meaning of json fields depending on the context, which may potentially introduce some bugs if our assumptions are incorrect in some case, e.g. it is not completely obvious if it is a right approach here https://github.com/elastic/elasticsearch/pull/38373/files/314c58a64e5c4f2fb3fc9ec6e3366a0209597dd5#r253675646

what is the process for deciding between a and b here?

Copy link
Contributor

@mayya-sharipova mayya-sharipova Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandmannn Thanks for clarification. Ok, lets leave your code for json serialization for DocumentField at it is now, but just put comments why isMetadataField is a special field, why it is not participating in equals, hashCode, toString, and toXContent.
You still need to modify readFrom and writeTo methods to include this field depending on the version.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandmannn thanks for your PR, here is the 1st round of requested changes.

@@ -132,7 +133,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
return builder;
}

public static DocumentField fromXContent(XContentParser parser) throws IOException {
public static DocumentField fromXContent(XContentParser parser, boolean inMetadataArea) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inMetadataArea why not call this parameter the same isMetadataField?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter naming follows the intent to emphasize and make it explicit that we decide whether the field is metadata or not depending on whether it is located in metadata area of xcontent or not.

I don't really have a strong opinion here, we can replace it with isMetadataField and add some comments, if you prefer that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation, isMetadataField sounds better for consistency with the rest of the code. And making a field being a part of medata makes it a metadata field.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through this in more detail, I wonder if DocumentField should not have any isMetadataField() method at all. Currently we determine whether a field is a meta field or not through this static list in the MapperService. Yet we want this to be through the registration of the underlying Mapper (which we already have, it is just not exposed to most parts of the code). DocumentField itself is a POJO of sorts, having no knowledge of the node it is being serialized/deserialized on. Instead of trying to force this information into the DocumentField, I think we can modify the two callers of DocumentField.isMetadataField() to look elsewhere (ie the MapperService) to determine whether a field is a meta field or not. This would have the same underlying effect: the node the DocumentField is being used on would control where in the structure the field ends up. But this is no worse than the node level static list we have today, and matches with the node level registration we have.

@sandmannn
Copy link
Contributor Author

Thanks for the suggestion @rjernst , it might result in a cleaner solution. Yet currently it returns us to the old question from the issue discussion #24422 (comment) i.e. what can be the right way of getting the right instance of MapperService in the xContent serialization functions of GetResult and SearchHit where we need this information. Is there any way to directly access the current Node and/or MapperService ? Otherwise we may need to inject MapperService in all constructors of GetResult and SearchHit, which is also quite messy.

@rjernst
Copy link
Member

rjernst commented Mar 13, 2019

@sandmannn You are right my suggestion raises other concerns. I looked over this with @mayya-sharipova and we came up with a slightly different approach I think will be much cleaner. The idea is to change the places storing DocumentField to have two maps, one for fields and one for metadata fields. In all the DocumentField construction sites you changed here, the object is placed into a Map which later gets read when serializing xcontent in GetResult and SearchHit. By those two classes using two maps, the call sites you changed here can decide which map to place the newly constructed DocumentField into, rather than stashing this knowledge inside the DocumentField. I don't think it would be a very large change to this PR, but if it turns out to be, adding the new map could be a done as a first isolated PR.

@sandmannn
Copy link
Contributor Author

Thanks for aligning @rjernst , @mayya-sharipova ! Just to make sure I got the idea right, let me paraphrase it. Lets focus for a moment at GetResult as it already has a construct that looks somewhat similar. When we are serializing GetResult we are already separating the metadata fields from other fields (currently not in maps, but in lists) here https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L254-L257

If I understanding the suggestion correctly, the idea is to do this separation even earlier, i.e. when we are receiving the fields in the constructor, we would separate them into something like this.metadataFieldsMap and this.otherFieldsMap instead of filling the this.fields directly
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L93-L95

In such case, I think one piece is still not completely clear for me: how will we separate between metadata and non-metadata fields in constructor of GetResult? Using the same field.isMetadataField() (which I believe is not our intent) or via something else? Once we have this separating information I think we will be able to keep it during serialization/deserialization, but getting this separation in the first place may be a bit tricky.

@rjernst
Copy link
Member

rjernst commented Mar 19, 2019

We wouldn't be doing the split within the GetResult constructor. It would be at the locations calling the constructor. In the case of GetResult, this is deserialization, and ShardGetService. For the first, we need to update the serialization format for GetResult to store two separate maps, and in deserialization we would use the existing static method (it will have to remain in the code until we no longer support rolling upgrades to whichever version this is added in, ie 8.0). For ShardGetService, we can have the MapperService.

* Added version dependent binary stream serialization

* Added comments for not used functions

* Updated constructor of GetResult

* Removed old changes in documentField

* clean tests

* Clean up document field changes

* more cleanup

* Addjusted dependent tests for GetResult

* Minor cleanup
@rjernst rjernst merged commit 8177e71 into elastic:master May 23, 2019
@rjernst
Copy link
Member

rjernst commented May 23, 2019

Thanks for pushing through on this @sandmannn!!

rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request May 23, 2019
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in elastic#24422.
@sandmannn
Copy link
Contributor Author

sandmannn commented May 23, 2019

thanks for merging! Now what is a process about the actual downport? Do we need a new PR against 7.x?

@sandmannn sandmannn deleted the documentfield branch May 23, 2019 20:42
rjernst added a commit that referenced this pull request May 23, 2019
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
rjernst added a commit to rjernst/elasticsearch that referenced this pull request May 23, 2019
This commit reenables bwc tests now that the backport of elastic#38373 is
complete.
@rjernst rjernst mentioned this pull request May 23, 2019
rjernst added a commit that referenced this pull request May 23, 2019
This commit reenables bwc tests now that the backport of #38373 is
complete.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in elastic#24422.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
This commit reenables bwc tests now that the backport of elastic#38373 is
complete.
@rjernst
Copy link
Member

rjernst commented May 31, 2019

@sandmannn I have already backported, see a49bafc.

mayya-sharipova pushed a commit that referenced this pull request Mar 30, 2020
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: #38373
Relates to issue #24422
mayya-sharipova pushed a commit to mayya-sharipova/elasticsearch that referenced this pull request Apr 1, 2020
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue elastic#24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: elastic#38373
Relates to issue elastic#24422
mayya-sharipova added a commit that referenced this pull request Apr 1, 2020
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: #38373
Relates to issue #24422

Co-authored-by: sandmannn <[email protected]>
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 15, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

This PR also refactors DocumentField to add a new parameter
to the constructor of DocumentField isMetadataField
which describes whether this is metadata field or not.
This refactoring is necessary as there is no more statiic
method to check if a field is meta-field, but DocumentField
objects are always created from the contexts  where this
information is available, and will be passed to DocumentField
constructor for further usage.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request May 29, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit that referenced this pull request Jun 5, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this pull request Jun 5, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related elastic#38373, elastic#41656
Closes elastic#24422
mayya-sharipova added a commit that referenced this pull request Jun 8, 2020
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types v7.3.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants