-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing metadata information in GetResult #38373
Conversation
@@ -350,15 +352,15 @@ public static GetResult fromXContentEmbedded(XContentParser parser, String index | |||
} | |||
} else if (FIELDS.equals(currentFieldName)) { | |||
while(parser.nextToken() != XContentParser.Token.END_OBJECT) { | |||
DocumentField getField = DocumentField.fromXContent(parser); | |||
DocumentField getField = DocumentField.fromXContent(parser, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst This DocumentField is not inside of _source
, but I assumed that it is not a metadata field based on the way the FIELDS component is filled during serialization https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/get/GetResult.java#L277-L282
Pinging @elastic/es-search |
@sandmannn Thank you for your PR. Can you please sing CLA, so after that we can start CI test for this PR. |
Hi, I actually signed it several times, but the check is still red. Should I provide a Transaction ID? |
sorry @sandmannn it looks like we had some technical problems at the time that you signed the CLA, we apologize. Would you mind signing it once again? Hopefully this time it will work. |
Hi @javanna , I just did it again. looks like that check is passed now. |
@elasticmachine test this please |
Hi @sandmannn, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile? |
88c2a71
to
e657574
Compare
@elasticmachine test this please |
1 similar comment
@elasticmachine test this please |
@elasticmachine test this please |
@@ -47,14 +46,16 @@ | |||
public class DocumentField implements Streamable, ToXContentFragment, Iterable<Object> { | |||
|
|||
private String name; | |||
private Boolean isMetadataField; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be final
? If you are adding a new member variable, you should modify all corresponding methods to incorporate it, methods such as readFrom
, writeTo
, equals
, hashCode
, toString
. For readFrom
and writeTo
the serialization should be a version dependent (an example here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think more how to organize DocumentField
class. Usually fromXContent -> toXContent ->fromXContent
should produce an equal object (and this is what we test in DocumentFieldTests
). The way how you organized a DocumentField
class, it doesn't happen as toXContent
is using isMetadataField
, and fromXContent
is not using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it requires a lot of changes to use isMetadataField
in toXContent
(we need to investigate this first), then we should document why we are not including it in toXContent
, and also exclude this field from equals
, hashCode
- also documenting why isMetadataField
is not participating in these functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Regarding consistency of fromXContent -> toXContent ->fromXContent
we had a relevant discussion in the issue description. There was a suggestion of using different json structures depending when serializing from new versions, while still being able to deserialize json from old versions, as suggested in #24422 (comment) . In such case it would have been possible to have enough information in the json to be able to make the conversions consistent without context. Yet there was a strong argument to avoid as mentioned here #24422 (comment)
as changing such json structure may result in issues when serializing it in one version and parsing in another.
In short, looks like it is not a lot of changes, but a bwc concern that prevents us from storing the isMetadataField
in toXContent
. It would be great if you share your thoughts on the matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we need to make a trade-off between a)extending serialized json schema for document field, additional coding to guarantee backward compatibility for serialized content and b)guessing the meaning of json fields depending on the context, which may potentially introduce some bugs if our assumptions are incorrect in some case, e.g. it is not completely obvious if it is a right approach here https://github.com/elastic/elasticsearch/pull/38373/files/314c58a64e5c4f2fb3fc9ec6e3366a0209597dd5#r253675646
what is the process for deciding between a
and b
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandmannn Thanks for clarification. Ok, lets leave your code for json serialization for DocumentField
at it is now, but just put comments why isMetadataField
is a special field, why it is not participating in equals, hashCode, toString, and toXContent
.
You still need to modify readFrom
and writeTo
methods to include this field depending on the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandmannn thanks for your PR, here is the 1st round of requested changes.
@@ -132,7 +133,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
return builder; | |||
} | |||
|
|||
public static DocumentField fromXContent(XContentParser parser) throws IOException { | |||
public static DocumentField fromXContent(XContentParser parser, boolean inMetadataArea) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inMetadataArea
why not call this parameter the same isMetadataField
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this parameter naming follows the intent to emphasize and make it explicit that we decide whether the field is metadata or not depending on whether it is located in metadata area of xcontent or not.
I don't really have a strong opinion here, we can replace it with isMetadataField and add some comments, if you prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation, isMetadataField
sounds better for consistency with the rest of the code. And making a field being a part of medata makes it a metadata field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking through this in more detail, I wonder if DocumentField should not have any isMetadataField()
method at all. Currently we determine whether a field is a meta field or not through this static list in the MapperService. Yet we want this to be through the registration of the underlying Mapper (which we already have, it is just not exposed to most parts of the code). DocumentField itself is a POJO of sorts, having no knowledge of the node it is being serialized/deserialized on. Instead of trying to force this information into the DocumentField, I think we can modify the two callers of DocumentField.isMetadataField()
to look elsewhere (ie the MapperService) to determine whether a field is a meta field or not. This would have the same underlying effect: the node the DocumentField is being used on would control where in the structure the field ends up. But this is no worse than the node level static list we have today, and matches with the node level registration we have.
Thanks for the suggestion @rjernst , it might result in a cleaner solution. Yet currently it returns us to the old question from the issue discussion #24422 (comment) i.e. what can be the right way of getting the right instance of |
@sandmannn You are right my suggestion raises other concerns. I looked over this with @mayya-sharipova and we came up with a slightly different approach I think will be much cleaner. The idea is to change the places storing DocumentField to have two maps, one for fields and one for metadata fields. In all the DocumentField construction sites you changed here, the object is placed into a Map which later gets read when serializing xcontent in GetResult and SearchHit. By those two classes using two maps, the call sites you changed here can decide which map to place the newly constructed DocumentField into, rather than stashing this knowledge inside the DocumentField. I don't think it would be a very large change to this PR, but if it turns out to be, adding the new map could be a done as a first isolated PR. |
Thanks for aligning @rjernst , @mayya-sharipova ! Just to make sure I got the idea right, let me paraphrase it. Lets focus for a moment at If I understanding the suggestion correctly, the idea is to do this separation even earlier, i.e. when we are receiving the fields in the constructor, we would separate them into something like In such case, I think one piece is still not completely clear for me: how will we separate between metadata and non-metadata fields in constructor of GetResult? Using the same |
We wouldn't be doing the split within the |
* Added version dependent binary stream serialization * Added comments for not used functions * Updated constructor of GetResult * Removed old changes in documentField * clean tests * Clean up document field changes * more cleanup * Addjusted dependent tests for GetResult * Minor cleanup
Thanks for pushing through on this @sandmannn!! |
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in elastic#24422.
thanks for merging! Now what is a process about the actual downport? Do we need a new PR against 7.x? |
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
This commit reenables bwc tests now that the backport of elastic#38373 is complete.
This commit reenables bwc tests now that the backport of #38373 is complete.
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in elastic#24422.
This commit reenables bwc tests now that the backport of elastic#38373 is complete.
@sandmannn I have already backported, see a49bafc. |
Refactor SearchHit to have separate document and meta fields. This is a part of bigger refactoring of issue elastic#24422 to remove dependency on MapperService to check if a field is metafield. Relates to PR: elastic#38373 Relates to issue elastic#24422
Refactor SearchHit to have separate document and meta fields. This is a part of bigger refactoring of issue #24422 to remove dependency on MapperService to check if a field is metafield. Relates to PR: #38373 Relates to issue #24422 Co-authored-by: sandmannn <[email protected]>
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. This PR also refactors DocumentField to add a new parameter to the constructor of DocumentField isMetadataField which describes whether this is metadata field or not. This refactoring is necessary as there is no more statiic method to check if a field is meta-field, but DocumentField objects are always created from the contexts where this information is available, and will be passed to DocumentField constructor for further usage. Related elastic#38373, elastic#41656 Closes elastic#24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related elastic#38373, elastic#41656 Closes elastic#24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related elastic#38373, elastic#41656 Closes elastic#24422
Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422
This PR adds a new parameter to the constructor of document field
isMetadataField
which describes whether the DocumentField in question is metadata field or not. It is part of larger refactoring that aims to remove the calls to static methods ofMapperService
related to metadata fields, as discussed in #24422.