Skip to content

Commit

Permalink
Introduce a constant_keyword field. (elastic#49713)
Browse files Browse the repository at this point in the history
This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.

The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?

For this field there is a choice between
 1. accepting values in `_source` when they are equal to the value configured
    in mappings, but rejecting mapping updates
 2. rejecting values in `_source` but then allowing updates to the value that
    is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.

Backport of elastic#49713
  • Loading branch information
jpountz committed Mar 2, 2020
1 parent a267849 commit 9a11f75
Show file tree
Hide file tree
Showing 16 changed files with 1,184 additions and 1 deletion.
112 changes: 112 additions & 0 deletions docs/reference/how-to/search-speed.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -418,3 +418,115 @@ The <<text,`text`>> field has an <<index-prefixes,`index_prefixes`>> option that
indexes prefixes of all terms and is automatically leveraged by query parsers to
run prefix queries. If your use-case involves running lots of prefix queries,
this can speed up queries significantly.

[[faster-filtering-with-constant-keyword]]
=== Use <<constant-keyword,`constant_keyword`>> to speed up filtering

There is a general rule that the cost of a filter is mostly a function of the
number of matched documents. Imagine that you have an index containing cycles.
There are a large number of bicycles and many searches perform a filter on
`cycle_type: bicycle`. This very common filter is unfortunately also very costly
since it matches most documents. There is a simple way to avoid running this
filter: move bicycles to their own index and filter bicycles by searching this
index instead of adding a filter to the query.

Unfortunately this can make client-side logic tricky, which is where
`constant_keyword` helps. By mapping `cycle_type` as a `constant_keyword` with
value `bicycle` on the index that contains bicycles, clients can keep running
the exact same queries as they used to run on the monolithic index and
Elasticsearch will do the right thing on the bicycles index by ignoring filters
on `cycle_type` if the value is `bicycle` and returning no hits otherwise.

Here is what mappings could look like:

[source,console]
--------------------------------------------------
PUT bicycles
{
"mappings": {
"properties": {
"cycle_type": {
"type": "constant_keyword",
"value": "bicycle"
},
"name": {
"type": "text"
}
}
}
}
PUT other_cycles
{
"mappings": {
"properties": {
"cycle_type": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
}
}
--------------------------------------------------

We are splitting our index in two: one that will contain only bicycles, and
another one that contains other cycles: unicycles, tricycles, etc. Then at
search time, we need to search both indices, but we don't need to modify
queries.


[source,console]
--------------------------------------------------
GET bicycles,other_cycles/_search
{
"query": {
"bool": {
"must": {
"match": {
"description": "dutch"
}
},
"filter": {
"term": {
"cycle_type": "bicycle"
}
}
}
}
}
--------------------------------------------------
// TEST[continued]

On the `bicycles` index, Elasticsearch will simply ignore the `cycle_type`
filter and rewrite the search request to the one below:

[source,console]
--------------------------------------------------
GET bicycles,other_cycles/_search
{
"query": {
"match": {
"description": "dutch"
}
}
}
--------------------------------------------------
// TEST[continued]

On the `other_cycles` index, Elasticsearch will quickly figure out that
`bicycle` doesn't exist in the terms dictionary of the `cycle_type` field and
return a search response with no hits.

This is a powerful way of making queries cheaper by putting common values in a
dedicated index. This idea can also be combined across multiple fields: for
instance if you track the color of each cycle and your `bicycles` index ends up
having a majority of black bikes, you could split it into a `bicycles-black`
and a `bicycles-other-colors` indices.

The `constant_keyword` is not strictly required for this optimization: it is
also possible to update the client-side logic in order to route queries to the
relevant indices based on filters. However `constant_keyword` makes it
transparently and allows to decouple search requests from the index topology in
exchange of very little overhead.
6 changes: 5 additions & 1 deletion docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<histogram>>:: `histogram` for pre-aggregated numerical values for percentiles aggregations.

<<constant-keyword>>:: Specialization of `keyword` for the case when all documents have the same value.

[float]
[[types-array-handling]]
=== Arrays
Expand Down Expand Up @@ -130,4 +132,6 @@ include::types/text.asciidoc[]

include::types/token-count.asciidoc[]

include::types/shape.asciidoc[]
include::types/shape.asciidoc[]

include::types/constant-keyword.asciidoc[]
85 changes: 85 additions & 0 deletions docs/reference/mapping/types/constant-keyword.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
[role="xpack"]
[testenv="basic"]

[[constant-keyword]]
=== Constant keyword datatype
++++
<titleabbrev>Constant keyword</titleabbrev>
++++

Constant keyword is a specialization of the <<keyword,`keyword`>> field for
the case that all documents in the index have the same value.

[source,console]
--------------------------------
PUT logs-debug
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "text"
},
"level": {
"type": "constant_keyword",
"value": "debug"
}
}
}
}
--------------------------------

`constant_keyword` supports the same queries and aggregations as `keyword`
fields do, but takes advantage of the fact that all documents have the same
value per index to execute queries more efficiently.

It is both allowed to submit documents that don't have a value for the field or
that have a value equal to the value configured in mappings. The two below
indexing requests are equivalent:

[source,console]
--------------------------------
POST logs-debug/_doc
{
"date": "2019-12-12",
"message": "Starting up Elasticsearch",
"level": "debug"
}
POST logs-debug/_doc
{
"date": "2019-12-12",
"message": "Starting up Elasticsearch"
}
--------------------------------
//TEST[continued]

However providing a value that is different from the one configured in the
mapping is disallowed.

In case no `value` is provided in the mappings, the field will automatically
configure itself based on the value contained in the first indexed document.
While this behavior can be convenient, note that it means that a single
poisonous document can cause all other documents to be rejected if it had a
wrong value.

The `value` of the field cannot be changed after it has been set.

[[constant-keyword-params]]
==== Parameters for constant keyword fields

The following mapping parameters are accepted:

[horizontal]

<<mapping-field-meta,`meta`>>::

Metadata about the field.

`value`::

The value to associate with all documents in the index. If this parameter
is not provided, it is set based on the first document that gets indexed.

4 changes: 4 additions & 0 deletions docs/reference/rest-api/info.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ Example response:
"available" : true,
"enabled" : true
},
"constant_keyword" : {
"available" : true,
"enabled" : true
},
"enrich" : {
"available" : true,
"enabled" : true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ public Collection<Accountable> getChildResources() {

@Override
public SortedSetDocValues getOrdinalsValues() {
if (value == null) {
return DocValues.emptySortedSet();
}
final BytesRef term = new BytesRef(value);
final SortedDocValues sortedValues = new AbstractSortedDocValues() {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,10 @@ public boolean isAnalyticsAllowed() {
return allowForAllLicenses();
}

public boolean isConstantKeywordAllowed() {
return allowForAllLicenses();
}

/**
* @return true if security is available to be used with the current license type
*/
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.xpack.core;

import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.xpack.core.flattened.FlattenedFeatureSetUsage;

import java.io.IOException;
import java.util.Objects;

public class ConstantKeywordFeatureSetUsage extends XPackFeatureSet.Usage {

public ConstantKeywordFeatureSetUsage(StreamInput input) throws IOException {
super(input);
}

public ConstantKeywordFeatureSetUsage(boolean available, boolean enabled) {
super(XPackField.CONSTANT_KEYWORD, available, enabled);
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
FlattenedFeatureSetUsage that = (FlattenedFeatureSetUsage) o;
return available == that.available && enabled == that.enabled;
}

@Override
public int hashCode() {
return Objects.hash(available, enabled);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ public final class XPackField {
public static final String ANALYTICS = "analytics";
/** Name constant for the enrich plugin. */
public static final String ENRICH = "enrich";
/** Name constant for the constant-keyword plugin. */
public static final String CONSTANT_KEYWORD = "constant_keyword";

private XPackField() {}

Expand Down
24 changes: 24 additions & 0 deletions x-pack/plugin/mapper-constant-keyword/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

evaluationDependsOn(xpackModule('core'))

apply plugin: 'elasticsearch.esplugin'

esplugin {
name 'constant-keyword'
description 'Module for the constant-keyword field type, which is a specialization of keyword for the case when all documents have the same value.'
classname 'org.elasticsearch.xpack.constantkeyword.ConstantKeywordMapperPlugin'
extendedPlugins = ['x-pack-core']
}
archivesBaseName = 'x-pack-constant-keyword'

dependencies {
compileOnly project(path: xpackModule('core'), configuration: 'default')
testCompile project(path: xpackModule('core'), configuration: 'testArtifacts')
}

integTest.enabled = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.xpack.constantkeyword;

import org.elasticsearch.action.ActionListener;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.license.XPackLicenseState;
import org.elasticsearch.xpack.core.ConstantKeywordFeatureSetUsage;
import org.elasticsearch.xpack.core.XPackFeatureSet;
import org.elasticsearch.xpack.core.XPackField;

import java.util.Map;

public class ConstantKeywordFeatureSet implements XPackFeatureSet {

private final XPackLicenseState licenseState;

@Inject
public ConstantKeywordFeatureSet(XPackLicenseState licenseState) {
this.licenseState = licenseState;
}

@Override
public String name() {
return XPackField.CONSTANT_KEYWORD;
}

@Override
public boolean available() {
return licenseState.isConstantKeywordAllowed();
}

@Override
public boolean enabled() {
return true;
}

@Override
public Map<String, Object> nativeCodeInfo() {
return null;
}

@Override
public void usage(ActionListener<Usage> listener) {
listener.onResponse(new ConstantKeywordFeatureSetUsage(available(), enabled()));
}

}
Loading

0 comments on commit 9a11f75

Please sign in to comment.