Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for 'external values' in document parsing? #56063

Open
2 of 5 tasks
jtibshirani opened this issue May 1, 2020 · 8 comments
Open
2 of 5 tasks

Remove support for 'external values' in document parsing? #56063

jtibshirani opened this issue May 1, 2020 · 8 comments
Labels
>refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >tech debt

Comments

@jtibshirani
Copy link
Contributor

jtibshirani commented May 1, 2020

For fields whose value can be a JSON array or object like geopoints, the standard way of supporting multi-fields doesn’t work. When parsing a document, the parent mapper consumes the whole complex value, so when the parser is passed on to its subfields, they aren't able to parse the source value. To help support this case, parent fields can pass a parsed object through ParseContext#externalValue. Subfields can then use this parsed object instead of parsing the source value.

This support for 'external values' has a few downsides:

  • It makes it harder to understand what data ended up getting indexed for a field, since a parent field is allowed to supply literally anything as an 'external value'. This in turn makes it hard to consult the _source to find field values (as we hope to do in Search 'fields' option design + implementation #55363).
  • In FieldMapper#parseCreateField, each field mapper must check whether an external value is set, regardless of whether it makes sense to pass custom values to the mapper. Making sure to check for an external value is a silent contract and is easy to forget.

I wanted to raise the idea of removing support for external values. Within our own code, here are the mappers that would be affected:

  • ParentJoinFieldMapper passes the IDs to its internal ParentIdFieldMapper objects. This could be replaced by a custom method on ParentIdFieldMapper.
  • PercolatorFieldMapper passes the query builder to an internal BinaryFieldMapper. This could also be replaced by a custom method.
  • CompletionFieldMapper passes a parsed completion object to its sub-fields (Completion types with multi-fields support #34081). I wonder how important it is to support multi-fields for completion mappings, or if we could just drop support.
  • GeoPointFieldMapper passes a geohash to its sub-fields. This allows it to have multi-fields that are geopoints or even keywords. I also wonder about the use cases + importance of this functionality.

It looks like external values were also used in the attachment mapper, but this mapper was removed and migrated to an ingest processor.

Here is a proposed process for removing this support

@jtibshirani jtibshirani added :Search Foundations/Mapping Index mappings, including merging and defining field types :Core/Infra/Plugins Plugin API and infrastructure >refactoring labels May 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 1, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Plugins)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label May 1, 2020
@talevy talevy self-assigned this May 13, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@williamrandolph
Copy link
Contributor

This issue showed up in the core/infra team's issue triage because it has the :Core/Infra/Plugins label. The ticket deals with mapper modules, so I can see why the label is here, but the label's definition is "Plugin API and Infrastructure," and I don't see this issue necessarily relating to plugins in that regard. I'm going to remove the :Core/Infra/Plugins and Team:Core/Infra labels in order to clarify issue ownership. But please let me know if I've missed something here.

@williamrandolph williamrandolph removed :Core/Infra/Plugins Plugin API and infrastructure Team:Core/Infra Meta label for core/infra team needs:triage Requires assignment of a team area label labels Jan 6, 2021
@jtibshirani
Copy link
Contributor Author

jtibshirani commented Jan 6, 2021

Thanks @williamrandolph, that makes sense to me. I just had a bad habit of using labels to ping teams for awareness, whereas they really indicate ownership.

@romseygeek
Copy link
Contributor

I had a thought on a nice way to implement this, which would be that rather that setting an external value on the parse context, we could instead switch out the XContentParser. Field mappers that consume object-type structures such as the completion mapper could create a new XContentParser over their parsed values and pass that on to any subfields. This could also allow us to add some guards on single-valued field mappers to prevent their subfields calling nextToken() and accidentally advancing the xcontent parser.

romseygeek added a commit that referenced this issue Apr 20, 2021
…1834)

We would like to remove the use of 'external values' in document parsing.
This commit simplifies two of the four places it is currently used, by adding
direct indexValue methods to BinaryFieldMapper and ParentIdFieldMapper.

Relates to #56063
romseygeek added a commit that referenced this issue Apr 20, 2021
…1834)

We would like to remove the use of 'external values' in document parsing.
This commit simplifies two of the four places it is currently used, by adding
direct indexValue methods to BinaryFieldMapper and ParentIdFieldMapper.

Relates to #56063
romseygeek added a commit that referenced this issue Apr 29, 2021
#72203)

The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
romseygeek added a commit to romseygeek/elasticsearch that referenced this issue Apr 29, 2021
elastic#72203)

The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to elastic#56063
romseygeek added a commit that referenced this issue Apr 29, 2021
#72203) (#72448)

The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
@javanna
Copy link
Member

javanna commented Jun 16, 2022

Question: given that we removed support for external values in #72448 , are we done with this issue? Where do we stand with the three items above that are not marked done, namely deprecate and remove multifields on geo_point, deprecate and remove nested completion mappers, and remove ParserContext.switchParser() ?

@jtibshirani
Copy link
Contributor Author

jtibshirani commented Jun 16, 2022

No we aren't done with this issue unfortunately. Here is the discussion on why #72203 doesn't close the issue and what remains to be done: #72203 (comment).

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >tech debt
Projects
None yet
Development

No branches or pull requests

8 participants