Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize sort on long field #48804

Merged
merged 18 commits into from
Nov 26, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
567a739
Optimize sort on numeric long and date fields (#39770)
mayya-sharipova Jun 10, 2019
410e28f
Merge branch 'master' into long_sort_optimization
jimczi Jun 18, 2019
a17757f
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Jun 25, 2019
1a9deae
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Jul 3, 2019
3c29734
Skip optimization if the index has duplicate data (#43121)
mayya-sharipova Jul 3, 2019
b5cad4d
Merge branch 'master' into long_sort_optimization
jimczi Jul 4, 2019
04e5e41
Merge branch 'master' into long_sort_optimization
jimczi Jul 5, 2019
33c8275
Sort leaves on search according to the primary numeric sort field (#4…
jimczi Aug 19, 2019
e167fb9
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Aug 21, 2019
c4c3b66
Remove nested collector in docs response
mayya-sharipova Aug 21, 2019
81971ac
Use collector manager for search when necessary (#45829)
mayya-sharipova Aug 30, 2019
a261e4f
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Oct 30, 2019
37d44ad
Use shared TopFieldCollector manager
mayya-sharipova Oct 30, 2019
5ce9e33
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Nov 12, 2019
ddf165c
Correct calculation of avg value to avoid overflow
mayya-sharipova Nov 15, 2019
64c430c
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Nov 15, 2019
6a0a284
Optimize calculating if index has duplicate data
mayya-sharipova Nov 15, 2019
bf8e17a
Merge remote-tracking branch 'upstream/master' into long_sort_optimiz…
mayya-sharipova Nov 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,9 @@ class BuildPlugin implements Plugin<Project> {
// TODO: remove this once ctx isn't added to update script params in 7.0
test.systemProperty 'es.scripting.update.ctx_in_params', 'false'

// TODO: remove this property in 8.0
test.systemProperty 'es.search.rewrite_sort', 'true'

// TODO: remove this once cname is prepended to transport.publish_address by default in 8.0
test.systemProperty 'es.transport.cname_in_publish_address', 'true'

Expand Down
61 changes: 20 additions & 41 deletions docs/reference/search/profile.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -153,16 +153,9 @@ The API returns the following result:
"rewrite_time": 51443,
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time_in_nanos": "304311",
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": "32273"
}
]
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": "32273"
}
]
}
Expand Down Expand Up @@ -445,16 +438,9 @@ Looking at the previous example:
--------------------------------------------------
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time_in_nanos": "304311",
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": "32273"
}
]
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": "32273"
}
]
--------------------------------------------------
Expand Down Expand Up @@ -657,33 +643,26 @@ The API returns the following result:
"rewrite_time": 7208,
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time_in_nanos": 2390,
"name": "MultiCollector",
"reason": "search_multi",
"time_in_nanos": 1820,
"children": [
{
"name": "MultiCollector",
"reason": "search_multi",
"time_in_nanos": 1820,
"name": "FilteredCollector",
"reason": "search_post_filter",
"time_in_nanos": 7735,
"children": [
{
"name": "FilteredCollector",
"reason": "search_post_filter",
"time_in_nanos": 7735,
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": 1328
}
]
},
{
"name": "MultiBucketCollector: [[my_scoped_agg, my_global_agg]]",
"reason": "aggregation",
"time_in_nanos": 8273
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": 1328
}
]
},
{
"name": "MultiBucketCollector: [[my_scoped_agg, my_global_agg]]",
"reason": "aggregation",
"time_in_nanos": 8273
}
]
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import org.apache.lucene.search.CollectionStatistics;
import org.apache.lucene.search.CollectionTerminatedException;
import org.apache.lucene.search.Collector;
import org.apache.lucene.search.CollectorManager;
import org.apache.lucene.search.ConjunctionDISI;
import org.apache.lucene.search.DocIdSetIterator;
import org.apache.lucene.search.Explanation;
Expand All @@ -35,24 +36,31 @@
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryCache;
import org.apache.lucene.search.QueryCachingPolicy;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreMode;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.TermStatistics;
import org.apache.lucene.search.TopFieldDocs;
import org.apache.lucene.search.TotalHits;
import org.apache.lucene.search.Weight;
import org.apache.lucene.search.similarities.Similarity;
import org.apache.lucene.util.BitSet;
import org.apache.lucene.util.BitSetIterator;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.CombinedBitSet;
import org.apache.lucene.util.SparseFixedBitSet;
import org.elasticsearch.common.lucene.search.TopDocsAndMaxScore;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.dfs.AggregatedDfs;
import org.elasticsearch.search.profile.Timer;
import org.elasticsearch.search.profile.query.ProfileWeight;
import org.elasticsearch.search.profile.query.QueryProfileBreakdown;
import org.elasticsearch.search.profile.query.QueryProfiler;
import org.elasticsearch.search.profile.query.QueryTimingType;
import org.elasticsearch.search.query.QuerySearchResult;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;
Expand Down Expand Up @@ -131,12 +139,86 @@ public Weight createWeight(Query query, ScoreMode scoreMode, float boost) throws
}
}

private void checkCancelled() {
if (checkCancelled != null) {
checkCancelled.run();
}
}

public void search(List<LeafReaderContext> leaves, Weight weight, CollectorManager manager,
QuerySearchResult result, DocValueFormat[] formats, TotalHits totalHits) throws IOException {
final List<Collector> collectors = new ArrayList<>(leaves.size());
for (LeafReaderContext ctx : leaves) {
final Collector collector = manager.newCollector();
searchLeaf(ctx, weight, collector);
collectors.add(collector);
}
TopFieldDocs mergedTopDocs = (TopFieldDocs) manager.reduce(collectors);
// Lucene sets shards indexes during merging of topDocs from different collectors
// We need to reset shard index; ES will set shard index later during reduce stage
for (ScoreDoc scoreDoc : mergedTopDocs.scoreDocs) {
scoreDoc.shardIndex = -1;
}
if (totalHits != null) { // we have already precalculated totalHits for the whole index
mergedTopDocs = new TopFieldDocs(totalHits, mergedTopDocs.scoreDocs, mergedTopDocs.fields);
}
result.topDocs(new TopDocsAndMaxScore(mergedTopDocs, Float.NaN), formats);
}

@Override
protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException {
final Weight cancellableWeight;
if (checkCancelled != null) {
cancellableWeight = new Weight(weight.getQuery()) {
for (LeafReaderContext ctx : leaves) { // search each subreader
searchLeaf(ctx, weight, collector);
}
}

/**
* Lower-level search API.
*
* {@link LeafCollector#collect(int)} is called for every matching document in
* the provided <code>ctx</code>.
*/
private void searchLeaf(LeafReaderContext ctx, Weight weight, Collector collector) throws IOException {
checkCancelled();
weight = wrapWeight(weight);
final LeafCollector leafCollector;
try {
leafCollector = collector.getLeafCollector(ctx);
} catch (CollectionTerminatedException e) {
// there is no doc of interest in this reader context
// continue with the following leaf
return;
}
Bits liveDocs = ctx.reader().getLiveDocs();
BitSet liveDocsBitSet = getSparseBitSetOrNull(liveDocs);
if (liveDocsBitSet == null) {
BulkScorer bulkScorer = weight.bulkScorer(ctx);
if (bulkScorer != null) {
try {
bulkScorer.score(leafCollector, liveDocs);
} catch (CollectionTerminatedException e) {
// collection was terminated prematurely
// continue with the following leaf
}
}
} else {
// if the role query result set is sparse then we should use the SparseFixedBitSet for advancing:
Scorer scorer = weight.scorer(ctx);
if (scorer != null) {
try {
intersectScorerAndBitSet(scorer, liveDocsBitSet, leafCollector,
checkCancelled == null ? () -> { } : checkCancelled);
} catch (CollectionTerminatedException e) {
// collection was terminated prematurely
// continue with the following leaf
}
}
}
}

private Weight wrapWeight(Weight weight) {
if (checkCancelled != null) {
return new Weight(weight.getQuery()) {
@Override
public void extractTerms(Set<Term> terms) {
throw new UnsupportedOperationException();
Expand Down Expand Up @@ -168,48 +250,10 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
}
};
} else {
cancellableWeight = weight;
return weight;
}
searchInternal(leaves, cancellableWeight, collector);
}

private void searchInternal(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException {
for (LeafReaderContext ctx : leaves) { // search each subreader
final LeafCollector leafCollector;
try {
leafCollector = collector.getLeafCollector(ctx);
} catch (CollectionTerminatedException e) {
// there is no doc of interest in this reader context
// continue with the following leaf
continue;
}
Bits liveDocs = ctx.reader().getLiveDocs();
BitSet liveDocsBitSet = getSparseBitSetOrNull(liveDocs);
if (liveDocsBitSet == null) {
BulkScorer bulkScorer = weight.bulkScorer(ctx);
if (bulkScorer != null) {
try {
bulkScorer.score(leafCollector, liveDocs);
} catch (CollectionTerminatedException e) {
// collection was terminated prematurely
// continue with the following leaf
}
}
} else {
// if the role query result set is sparse then we should use the SparseFixedBitSet for advancing:
Scorer scorer = weight.scorer(ctx);
if (scorer != null) {
try {
intersectScorerAndBitSet(scorer, liveDocsBitSet, leafCollector,
checkCancelled == null ? () -> {} : checkCancelled);
} catch (CollectionTerminatedException e) {
// collection was terminated prematurely
// continue with the following leaf
}
}
}
}
}

private static BitSet getSparseBitSetOrNull(Bits liveDocs) {
if (liveDocs instanceof SparseFixedBitSet) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,6 @@ public class CollectorResult implements ToXContentObject, Writeable {
public static final String REASON_SEARCH_POST_FILTER = "search_post_filter";
public static final String REASON_SEARCH_MIN_SCORE = "search_min_score";
public static final String REASON_SEARCH_MULTI = "search_multi";
public static final String REASON_SEARCH_TIMEOUT = "search_timeout";
public static final String REASON_SEARCH_CANCELLED = "search_cancelled";
public static final String REASON_AGGREGATION = "aggregation";
public static final String REASON_AGGREGATION_GLOBAL = "aggregation_global";

Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,13 @@
import org.elasticsearch.common.lucene.MinimumScoreCollector;
import org.elasticsearch.common.lucene.search.FilteredCollector;
import org.elasticsearch.search.profile.query.InternalProfileCollector;
import org.elasticsearch.tasks.TaskCancelledException;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.function.BooleanSupplier;

import static org.elasticsearch.search.profile.query.CollectorResult.REASON_SEARCH_CANCELLED;
import static org.elasticsearch.search.profile.query.CollectorResult.REASON_SEARCH_MIN_SCORE;
import static org.elasticsearch.search.profile.query.CollectorResult.REASON_SEARCH_MULTI;
import static org.elasticsearch.search.profile.query.CollectorResult.REASON_SEARCH_POST_FILTER;
Expand Down Expand Up @@ -150,18 +147,6 @@ protected InternalProfileCollector createWithProfiler(InternalProfileCollector i
};
}

/**
* Creates a collector that throws {@link TaskCancelledException} if the search is cancelled
*/
static QueryCollectorContext createCancellableCollectorContext(BooleanSupplier cancelled) {
return new QueryCollectorContext(REASON_SEARCH_CANCELLED) {
@Override
Collector create(Collector in) throws IOException {
return new CancellableCollector(cancelled, in);
}
};
}

/**
* Creates collector limiting the collection to the first <code>numHits</code> documents
*/
Expand Down
Loading