Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QL: Wildcard support #59753

Closed
5 tasks done
costin opened this issue Jul 17, 2020 · 5 comments
Closed
5 tasks done

QL: Wildcard support #59753

costin opened this issue Jul 17, 2020 · 5 comments
Assignees
Labels
:Analytics/SQL SQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team v7.10.0

Comments

@costin
Copy link
Member

costin commented Jul 17, 2020

Now that wildcard support has landed in Elasticsearch, QL (EQL and SQL) should take advantage of it. This means:

  • Investigate if there's any special handling of the wildcard type in QL that is needed
  • Investigate whether different type of query translation is needed in order take advantage of this field. Currently it seems that is not needed as it will happen automatically.
  • Assess the impact it has on regular expression for EQL, in particular regarding its regex functionality.
  • Investigate whether SQL could leverage this functionality. Since most of the functionality happens transparently and thus SQL cannot really dictate the behavior, this is mainly about setting the expectations: what type of regex functionality is supported and whether SQL can inform the user about it (since wildcard is reported as keyword, the engine has little insight).
  • Improve testing to verify the expected behavior
@costin costin added :Analytics/SQL SQL querying :Analytics/EQL EQL querying Team:QL (Deprecated) Meta label for query languages team v7.10.0 labels Jul 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/SQL)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@astefan
Copy link
Contributor

astefan commented Sep 9, 2020

wildcard field type support has been added in QL as part of c874e6c and, with it, tests mainly to ES SQL. Since wildcard type discovery (and all other field types' discovery) is done through the use of _field_caps API, which now returns a keyword type for a wildcard field, there is no need for any special handling in QL/SQL/EQL.

Handling query translation differently either in SQL or EQL is not needed for two reasons:

  • _field_caps reports the field as keyword, so if the field is keyword (where support for it exists since SQL was built), wildcard or constant_keyword this shouldn't matter, the queries built for all these three field types will be the same.
  • when wildcard field type support has been added, one main aim of the ES team was to make its usage consistent in all queries, just like it's happening for a keyword field. There are exceptions but I don't think these affects SQL or EQL at this point.

The wildcard field type seems to be strong in queries involving regexes that start with wildcard and end with wildcard

  • atm, we use a wildcard query with, if exists, a keyword, wildcard or constant_keyword field type for endsWith and stringContains functions. These two functions could be considered as a regex of the form *abc and *abc* respectively, so seems well suited for a wildcard field type.
  • we also have the function starts_with (in both SQL and EQL) that use a prefix query under the hood where the pattern used could be associated with a regex of the form abc*.
  • based on https://www.elastic.co/blog/find-strings-within-strings-faster-with-the-new-elasticsearch-wildcard-field rule-of-thumb at the end of the post, a regex of the form abc* is probably better suited for a keyword field, whereas the other two types of regex could be better suited for a wildcard field, with some "it depends" disclaimer statements, as the blog post mentions.

For SQL in particular, the use of a wildcard fields (as opposed to a keyword field) seems to negatively impact sorting and aggregations in terms of execution time, execution being somewhat slower in general for a wildcard field according to the blog post.

@astefan
Copy link
Contributor

astefan commented Sep 9, 2020

Regarding testing, for EQL these have been improved with #62166.
SQL already has some specific tests in https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/sql/qa/server/src/main/resources/wildcard.csv-spec.

@astefan
Copy link
Contributor

astefan commented Sep 10, 2020

Note: since this field type has better performance in certain cases, I think ECS should update the schema for those relevant fields that could take advantage of the wildcard field type.

@astefan astefan closed this as completed Sep 10, 2020
@andreidan andreidan added >enhancement and removed :Analytics/EQL EQL querying labels Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team v7.10.0
Projects
None yet
Development

No branches or pull requests

4 participants