Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a constant_keyword field. (#49713) #53024

Merged
merged 4 commits into from
Mar 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions docs/reference/how-to/search-speed.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -418,3 +418,115 @@ The <<text,`text`>> field has an <<index-prefixes,`index_prefixes`>> option that
indexes prefixes of all terms and is automatically leveraged by query parsers to
run prefix queries. If your use-case involves running lots of prefix queries,
this can speed up queries significantly.

[[faster-filtering-with-constant-keyword]]
=== Use <<constant-keyword,`constant_keyword`>> to speed up filtering

There is a general rule that the cost of a filter is mostly a function of the
number of matched documents. Imagine that you have an index containing cycles.
There are a large number of bicycles and many searches perform a filter on
`cycle_type: bicycle`. This very common filter is unfortunately also very costly
since it matches most documents. There is a simple way to avoid running this
filter: move bicycles to their own index and filter bicycles by searching this
index instead of adding a filter to the query.

Unfortunately this can make client-side logic tricky, which is where
`constant_keyword` helps. By mapping `cycle_type` as a `constant_keyword` with
value `bicycle` on the index that contains bicycles, clients can keep running
the exact same queries as they used to run on the monolithic index and
Elasticsearch will do the right thing on the bicycles index by ignoring filters
on `cycle_type` if the value is `bicycle` and returning no hits otherwise.

Here is what mappings could look like:

[source,console]
--------------------------------------------------
PUT bicycles
{
"mappings": {
"properties": {
"cycle_type": {
"type": "constant_keyword",
"value": "bicycle"
},
"name": {
"type": "text"
}
}
}
}

PUT other_cycles
{
"mappings": {
"properties": {
"cycle_type": {
"type": "keyword"
},
"name": {
"type": "text"
}
}
}
}
--------------------------------------------------

We are splitting our index in two: one that will contain only bicycles, and
another one that contains other cycles: unicycles, tricycles, etc. Then at
search time, we need to search both indices, but we don't need to modify
queries.


[source,console]
--------------------------------------------------
GET bicycles,other_cycles/_search
{
"query": {
"bool": {
"must": {
"match": {
"description": "dutch"
}
},
"filter": {
"term": {
"cycle_type": "bicycle"
}
}
}
}
}
--------------------------------------------------
// TEST[continued]

On the `bicycles` index, Elasticsearch will simply ignore the `cycle_type`
filter and rewrite the search request to the one below:

[source,console]
--------------------------------------------------
GET bicycles,other_cycles/_search
{
"query": {
"match": {
"description": "dutch"
}
}
}
--------------------------------------------------
// TEST[continued]

On the `other_cycles` index, Elasticsearch will quickly figure out that
`bicycle` doesn't exist in the terms dictionary of the `cycle_type` field and
return a search response with no hits.

This is a powerful way of making queries cheaper by putting common values in a
dedicated index. This idea can also be combined across multiple fields: for
instance if you track the color of each cycle and your `bicycles` index ends up
having a majority of black bikes, you could split it into a `bicycles-black`
and a `bicycles-other-colors` indices.

The `constant_keyword` is not strictly required for this optimization: it is
also possible to update the client-side logic in order to route queries to the
relevant indices based on filters. However `constant_keyword` makes it
transparently and allows to decouple search requests from the index topology in
exchange of very little overhead.
6 changes: 5 additions & 1 deletion docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<histogram>>:: `histogram` for pre-aggregated numerical values for percentiles aggregations.

<<constant-keyword>>:: Specialization of `keyword` for the case when all documents have the same value.

[float]
[[types-array-handling]]
=== Arrays
Expand Down Expand Up @@ -130,4 +132,6 @@ include::types/text.asciidoc[]

include::types/token-count.asciidoc[]

include::types/shape.asciidoc[]
include::types/shape.asciidoc[]

include::types/constant-keyword.asciidoc[]
85 changes: 85 additions & 0 deletions docs/reference/mapping/types/constant-keyword.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
[role="xpack"]
[testenv="basic"]

[[constant-keyword]]
=== Constant keyword datatype
++++
<titleabbrev>Constant keyword</titleabbrev>
++++

Constant keyword is a specialization of the <<keyword,`keyword`>> field for
the case that all documents in the index have the same value.

[source,console]
--------------------------------
PUT logs-debug
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "text"
},
"level": {
"type": "constant_keyword",
"value": "debug"
}
}
}
}
--------------------------------

`constant_keyword` supports the same queries and aggregations as `keyword`
fields do, but takes advantage of the fact that all documents have the same
value per index to execute queries more efficiently.

It is both allowed to submit documents that don't have a value for the field or
that have a value equal to the value configured in mappings. The two below
indexing requests are equivalent:

[source,console]
--------------------------------
POST logs-debug/_doc
{
"date": "2019-12-12",
"message": "Starting up Elasticsearch",
"level": "debug"
}

POST logs-debug/_doc
{
"date": "2019-12-12",
"message": "Starting up Elasticsearch"
}
--------------------------------
//TEST[continued]

However providing a value that is different from the one configured in the
mapping is disallowed.

In case no `value` is provided in the mappings, the field will automatically
configure itself based on the value contained in the first indexed document.
While this behavior can be convenient, note that it means that a single
poisonous document can cause all other documents to be rejected if it had a
wrong value.

The `value` of the field cannot be changed after it has been set.

[[constant-keyword-params]]
==== Parameters for constant keyword fields

The following mapping parameters are accepted:

[horizontal]

<<mapping-field-meta,`meta`>>::

Metadata about the field.

`value`::

The value to associate with all documents in the index. If this parameter
is not provided, it is set based on the first document that gets indexed.

Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ public Collection<Accountable> getChildResources() {

@Override
public SortedSetDocValues getOrdinalsValues() {
if (value == null) {
return DocValues.emptySortedSet();
}
final BytesRef term = new BytesRef(value);
final SortedDocValues sortedValues = new AbstractSortedDocValues() {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,10 @@ public boolean isAnalyticsAllowed() {
return allowForAllLicenses();
}

public boolean isConstantKeywordAllowed() {
return allowForAllLicenses();
}

/**
* @return true if security is available to be used with the current license type
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ public final class XPackField {
public static final String ANALYTICS = "analytics";
/** Name constant for the enrich plugin. */
public static final String ENRICH = "enrich";
/** Name constant for the constant-keyword plugin. */
public static final String CONSTANT_KEYWORD = "constant_keyword";

private XPackField() {}

Expand Down
24 changes: 24 additions & 0 deletions x-pack/plugin/mapper-constant-keyword/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

evaluationDependsOn(xpackModule('core'))

apply plugin: 'elasticsearch.esplugin'

esplugin {
name 'constant-keyword'
description 'Module for the constant-keyword field type, which is a specialization of keyword for the case when all documents have the same value.'
classname 'org.elasticsearch.xpack.constantkeyword.ConstantKeywordMapperPlugin'
extendedPlugins = ['x-pack-core']
}
archivesBaseName = 'x-pack-constant-keyword'

dependencies {
compileOnly project(path: xpackModule('core'), configuration: 'default')
testCompile project(path: xpackModule('core'), configuration: 'testArtifacts')
}

integTest.enabled = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/

package org.elasticsearch.xpack.constantkeyword;

import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.index.mapper.Mapper;
import org.elasticsearch.plugins.ActionPlugin;
import org.elasticsearch.plugins.MapperPlugin;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.xpack.constantkeyword.mapper.ConstantKeywordFieldMapper;

import java.util.Map;

import static java.util.Collections.singletonMap;

public class ConstantKeywordMapperPlugin extends Plugin implements MapperPlugin, ActionPlugin {

public ConstantKeywordMapperPlugin(Settings settings) {}

@Override
public Map<String, Mapper.TypeParser> getMappers() {
return singletonMap(ConstantKeywordFieldMapper.CONTENT_TYPE, new ConstantKeywordFieldMapper.TypeParser());
}

}
Loading