Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QL: constant_keyword support #53241

Merged
merged 6 commits into from
Mar 16, 2020
Merged

QL: constant_keyword support #53241

merged 6 commits into from
Mar 16, 2020

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Mar 6, 2020

This PR adds support for constant_keyword data type following its addition in Elasticsearch with #49713.

Addresses #53016

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/SQL)

@astefan
Copy link
Contributor Author

astefan commented Mar 6, 2020

@elasticmachine run elasticsearch-ci/2

@astefan astefan marked this pull request as ready for review March 6, 2020 21:12
Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Left some comments.

// To mute tests follow example in file: example.csv-spec

//
// Tests testing field alias (introduced in ES 6.4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover comment?

@@ -213,6 +214,7 @@ protected Object unwrapMultiValue(Object values) {
protected boolean isFromDocValuesOnly(DataType dataType) {
return dataType == KEYWORD // because of ignore_above.
|| dataType == DATETIME
|| dataType == CONSTANT_KEYWORD // because a non-existent value is considered the constant value itself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain please, I don't get it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot extract the value of a constant_keyword from _source because if there is no source for the field, Elasticsearch will still consider the field as having the constant value.
Take a look at the example in our docs. No value at indexing time means level is debug. So, for the second document in the sample, there is no _source for level, but the value is still debug in case there is a query filtering on that field.

@@ -198,6 +195,9 @@ private static void loadEmpDatasetIntoEs(RestClient client, String index, String
}
hadLastItem = true;
bulk.append('"').append(titles.get(f)).append("\":\"").append(fields.get(f)).append('"');
if (titles.get(f).equals("gender") && extraFields) {
bulk.append(",\"extra_gender\":\"M\"");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer it to be in tha employees.csv file and have a different value from gender like Male and Female.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, it's a too impactful change to do it like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I mixed up the feature with an enum keyword that can take a set of predefined values in the mapping. It's ok as is, but maybe check if null is also acceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe having is a Male instead of M can help avoid confusion in the future regarding tests?

;

aggWithNullFilter
SELECT COUNT(*) count FROM test_emp_copy WHERE extra_gender IS NOT NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant_keyword cannot be null, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, I think it can be null... haven't tested this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess any type can be null (missing field in the doc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For null case, all the documents in the index, for that field, shouldn't have a value. As soon as one document is indexed with something in it for that field, all other documents will "inherit" that value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;

functionOverAlias
SELECT BIT_LENGTH(extra_gender) bit FROM test_emp_copy ORDER BY extra_gender LIMIT 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have 2 values Male/Female that should be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constant_keyword field has only one value.

@@ -631,6 +632,7 @@ public void testCommonType() {
assertEquals(BOOLEAN, commonType(BOOLEAN, BOOLEAN));
assertEquals(NULL, commonType(NULL, NULL));
assertEquals(INTEGER, commonType(INTEGER, KEYWORD));
assertEquals(DOUBLE, commonType(DOUBLE, CONSTANT_KEYWORD));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numeric and constant_keyword not tasted in DataTypeConverterTests of QL.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great set of tests. Left some comments on removal of the types in case of conflict since I think it might remove index information leading to incorrect mapping.

@@ -344,6 +351,11 @@ public static IndexResolution mergedMappings(DataTypeRegistry typeRegistry, Stri
}
}

// if there are both a keyword and a constant_keyword type for this field, only keep the keyword as a common compatible type
if (hasCompatibleKeywords) {
types.remove(CONSTANT_KEYWORD.esType());
Copy link
Member

@costin costin Mar 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the type all together might remove useful information.
For example if a field a is mapped as keyword in index X and constant_keyword in index Y, the piece above will find the collision and remove the field from Y resulting in a mapping only in X.
On one hand it does indicate that there's only field a however each field has index information underneath so in this case it would be just for index X. I'm not sure though whether we take that into account or not.
What happens is there's no removal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario is the one where two indices have the same named field with both keyword and constant_keyword types and their mappings need to be merged. For other scenarios (where the same name is used for two fields with different types) we tell the user that the field cannot be shown (because of the different mapping types), but in this very specific case and since these two field types are so similar I chose to display the field with the "common" data type as keyword.

The controversy for what to do in this specific case came from the scenario described here and is specifically tested here.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@astefan astefan requested a review from matriv March 13, 2020 14:37
Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thank you @astefan!

@astefan astefan merged commit d6cd4ce into elastic:master Mar 16, 2020
@astefan astefan deleted the 53016_feature branch March 16, 2020 13:22
astefan added a commit to astefan/elasticsearch that referenced this pull request Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants