Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a simple 'fetch fields' phase. #55639

Merged
merged 9 commits into from
Apr 24, 2020

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Apr 23, 2020

Note: the PR is opened against the field-retrieval feature branch.

Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. This PR just lays out the initial class structure and tests -- much of the interesting functionality will come in subsequent PRs.

Known issues that will be addressed in future commits:

  • Fields like geopoints whose value can be an array are not handled correctly. Without consulting the mapping, it's not possible to know if it's a multi-valued field, or a single field with an array value.
  • We take a very simple approach to loading the values from _source. I've had discussions with the scripting team about a new SourceLookup method they have planned to efficiently load particular values: Partially parse source documents to speed up source access #52591. I plan to make use of this when it's available, but for now added a placeholder method extractValues.

Relates to #55363.

search:
index: test
body:
fields: [keyword, integer_range]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll no longer use a flat list of fields once we introduce support for formatting the values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just have this temporarily because it is simpler to parse?

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.hasItems;

public class FetchFieldsPhaseTests extends ESSingleNodeTestCase {
Copy link
Contributor Author

@jtibshirani jtibshirani Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried hard to write proper unit tests (in the style of FetchSourcePhaseTests), but it's currently challenging to stub out the required pieces like source lookup. I think some broader refactors are necessary before it's possible to unit test these classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you made something like:

public void hitsExecute(Function<SearchHit, Map<String, Object>> sourceLookup, SearchHit[] hits)

and unit tested that? Then and the only thing missing a unit test would be the map from search context to "simple" classes?

Copy link
Contributor Author

@jtibshirani jtibshirani Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been hesitant to do this, because I thought it would make FetchFieldsPhase harder to read... but I actually think it turned out pretty cleanly. I don't think I followed your suggestion exactly though, curious to hear what you think.

Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories >feature and removed WIP labels Apr 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@nik9000
Copy link
Member

nik9000 commented Apr 23, 2020

* We take a very simple approach to loading the values from _source. I've had discussions with the scripting team about a new `SourceLookup` method they have planned to efficiently load particular values: #52591. I plan to make use of this when it's available, but for now added a placeholder method `SourceLookup#extractValues`.

👍

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start! I left some "for later" stuff and some "maybe now" stuff.

search:
index: test
body:
fields: [keyword, integer_range]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just have this temporarily because it is simpler to parse?

// all.
// This call doesn't really need to support writing any kind of object, since the values
// here are always serializable to xContent. Each value could be a leaf types like a string,
// number, or boolean, a list of such values, or a map of such values with string keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Except it won't be for stuff like geo_points, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if it'd be nicer to have the "type" of the field available so we don't need all the nasty reflection in value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is a problem for another time, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is a really unfortunate part of the current PR. It's on the top of my mind in terms of what needs figuring out for a follow-up.

@@ -244,6 +247,12 @@ public SearchSourceBuilder(StreamInput in) throws IOException {
sliceBuilder = in.readOptionalWriteable(SliceBuilder::new);
collapse = in.readOptionalWriteable(CollapseBuilder::new);
trackTotalHitsUpTo = in.readOptionalInt();

if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
if (in.readBoolean()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readOptionalStringList?

@@ -298,6 +307,13 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeOptionalWriteable(sliceBuilder);
out.writeOptionalWriteable(collapse);
out.writeOptionalInt(trackTotalHitsUpTo);

if (out.getVersion().onOrAfter(Version.V_8_0_0)) {
out.writeBoolean(fetchFields != null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeOptionalStringCollection?

@@ -1120,6 +1154,11 @@ public void parseXContent(XContentParser parser, boolean checkTrailingTokens) th
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
docValueFields.add(FieldAndFormat.fromXContent(parser));
}
} else if (FETCH_FIELDS_FIELD.match(currentFieldName, parser.getDeprecationHandler())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder when it'll be time to convert this to ObjectParser.....

Object value = entry.getValue();
List<Object> values = value instanceof List
? (List<Object>) value
: List.of(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see wanting to switch this to something a little less instanceof soon. But this'll get the job done for now!

@@ -201,6 +202,10 @@ public InnerHitsContext innerHits() {

public abstract SearchContext docValuesContext(FetchDocValuesContext docValuesContext);

public abstract FetchFieldsContext fetchFieldsContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have javadoc for this.

/**
* For each of the provided paths, return its value in the source. Note that
*/
public Map<String, Object> extractValues(Collection<String> paths) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this'd be better kept in the new fetch phase, especially if we're going to migrate away from it.

I wonder if a better interface for collecting the values is Map<String, Consumer<?>> or something shaped like that. I feel like it'd let you build flexible casting and formatting pretty naturally. But that can wait too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'll move this up into FetchFieldsPhase. I'll wait to refine the interface until I have a better handle on the approach to value formatting.

@@ -188,7 +188,7 @@ public void testFromXContentLenientParsing() throws IOException {
XContentType xContentType = randomFrom(XContentType.values());
SearchHit searchHit = createTestItem(xContentType, true, true);
BytesReference originalBytes = toXContent(searchHit, xContentType, true);
Predicate<String> pathsToExclude = path -> (path.endsWith("highlight") || path.endsWith("fields") || path.contains("_source")
Predicate<String> pathsToExclude = path -> (path.endsWith("highlight") || path.contains("fields") || path.contains("_source")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

This on looks like it took a while to find!

import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.hasItems;

public class FetchFieldsPhaseTests extends ESSingleNodeTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you made something like:

public void hitsExecute(Function<SearchHit, Map<String, Object>> sourceLookup, SearchHit[] hits)

and unit tested that? Then and the only thing missing a unit test would be the map from search context to "simple" classes?

@nik9000
Copy link
Member

nik9000 commented Apr 23, 2020

Looks like a great start! I left some "for later" stuff and some "maybe now" stuff.

And none of it really blocks merging, especially to a feature branch. But I do think now would be a simpler time to talk about the unit test and where to put the source extraction logic.

@jtibshirani
Copy link
Contributor Author

Thanks @nik9000 for the speedy review, it's ready for another look.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

/**
* The context needed to retrieve fields.
*/
public class FetchFieldsContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

@jtibshirani jtibshirani merged commit 1dcfdcc into elastic:field-retrieval Apr 24, 2020
@jtibshirani jtibshirani deleted the fetch-fields-phase branch April 24, 2020 16:27
jtibshirani added a commit that referenced this pull request Apr 24, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Apr 27, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Apr 28, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Apr 29, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request May 5, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request May 7, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request May 18, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request May 20, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request May 28, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request May 28, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jun 8, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jun 17, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jun 26, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jun 30, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jun 30, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jul 8, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jul 14, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jul 15, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jul 18, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jul 21, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit that referenced this pull request Jul 23, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Jul 27, 2020
Currently the phase just looks up each field name in the _source and returns its
values in the 'fields' section of the response. There are several aspects that
need improvement -- this PR just lays out the initial class structure and tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants