Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a limit for graph phrase query expansion #34031

Merged
merged 3 commits into from
Sep 25, 2018

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Sep 25, 2018

Today query parsers throw TooManyClauses exception when a query creates
too many clauses. However graph phrase queries do not respect this limit.
This change adds a protection against crazy expansions that can happen when
building a graph phrase query. This is a temporary copy of the fix available
in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet.
This logic will be removed when we integrate the Lucene patch in a future
release.

Today query parsers throw TooManyClauses exception when a query creates
too many clauses. However graph phrase queries do not respect this limit.
This change adds a protection against crazy expansions that can happen when
building a graph phrase query. This is a temporary copy of the fix available
in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet.
This logic will be removed when we integrate the Lucene patch in a future
release.
@jimczi jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.5.0 labels Sep 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Term[] terms = graph.getTerms(field, start);
assert terms.length > 0;
if (terms.length >= BooleanQuery.getMaxClauseCount()) {
throw new BooleanQuery.TooManyClauses();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the former check is triggered by the test, but this code path isn't. I haven't checked all the test setup logic, but would it be difficult to include a case that also triggers this code path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 5bc8a40 to test both paths

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi LGTM, I left two minor questions mostly for my own education, would be nice to hear your opinion those

TokenStream ts = it.next();
SpanQuery q = createSpanQuery(ts, field);
if (q != null) {
if (queries.size() >= BooleanQuery.getMaxClauseCount()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own education, and it is certainly super minor: when reading this part I was wondering if it would make sense to get the maxClauseCount limit once at the beginning of this method since its unlikely to change to avoid method calls in each iteration). Maybe Java does some clever optimizations to avoid this though and the effect most likely negligible.

@jimczi jimczi merged commit 0f878ef into elastic:master Sep 25, 2018
@jimczi jimczi deleted the max_graph_phrase_expansion branch September 25, 2018 19:38
jimczi added a commit that referenced this pull request Sep 25, 2018
Today query parsers throw TooManyClauses exception when a query creates
too many clauses. However graph phrase queries do not respect this limit.
This change adds a protection against crazy expansions that can happen when
building a graph phrase query. This is a temporary copy of the fix available
in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet.
This logic will be removed when we integrate the Lucene patch in a future
release.
@jimczi jimczi added the v6.4.2 label Sep 25, 2018
jimczi added a commit that referenced this pull request Sep 25, 2018
Today query parsers throw TooManyClauses exception when a query creates
too many clauses. However graph phrase queries do not respect this limit.
This change adds a protection against crazy expansions that can happen when
building a graph phrase query. This is a temporary copy of the fix available
in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet.
This logic will be removed when we integrate the Lucene patch in a future
release.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Sep 25, 2018
jimczi added a commit that referenced this pull request Oct 19, 2018
kcm pushed a commit that referenced this pull request Oct 30, 2018
Today query parsers throw TooManyClauses exception when a query creates
too many clauses. However graph phrase queries do not respect this limit.
This change adds a protection against crazy expansions that can happen when
building a graph phrase query. This is a temporary copy of the fix available
in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet.
This logic will be removed when we integrate the Lucene patch in a future
release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v6.4.2 v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants