Performance regression for search query in single node cluster #31877

ctrlaltdel · 2018-07-06T20:41:55Z

Elasticsearch version (bin/elasticsearch --version): 6.3.1

Plugins installed: []

JVM version (java -version): OpenJDK Runtime Environment (build 10.0.1+10-Ubuntu-3ubuntu1)

OS version (uname -a if on a Unix-like system): Ubuntu 18.04

Linux es-perf 4.13.0-25-generic #29-Ubuntu SMP Mon Jan 8 21:14:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

A single search query running on a single node elasticsearch cluster is much slower with elasticsearch 6.3.1 than it was with 5.5.3 while it would be expected to take the same time.

An inefficient use of the CPU due to the limit of concurrent shard requests per search request (implemented in #25632) is the likely culprit.

Steps to reproduce:

The following steps were tested in a virtual machine with 8 cores and 16 GB of RAM.

Install elasticsearch 5.5.3 from deb package
Create indices with test data using populate.sh (see below)
Run a single match_all search query while showing CPU usage

$ ./query.sh
Elasticsearch version 5.5.3

%Cpu(s): 55.7 us,  4.0 sy,  0.0 ni, 31.0 id,  9.1 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search
{"query":{"match_all":{}}}

%Cpu(s):  1.1 us,  2.2 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 76.5 us,  0.0 sy,  0.0 ni, 23.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 85.9 us,  1.2 sy,  0.0 ni, 12.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 334,
  "_shards": {
    "total": 100,
    "successful": 100,
    "failed": 0
  }
}
%Cpu(s): 24.7 us,  1.1 sy,  0.0 ni, 74.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 96.7 id,  1.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$

Upgrade to 6.3.1 by installing new package and restarting the service
Run the same query again

$ ./query.sh
Elasticsearch version 6.3.1

%Cpu(s): 54.8 us,  4.0 sy,  0.0 ni, 32.0 id,  9.1 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  2.2 us,  1.1 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search
{"query":{"match_all":{}}}

%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 25.6 us,  1.1 sy,  0.0 ni, 72.2 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0 st
%Cpu(s): 52.8 us,  1.1 sy,  0.0 ni, 46.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 62.1 us,  0.0 sy,  0.0 ni, 37.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 59.8 us,  1.1 sy,  0.0 ni, 39.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 58.8 us,  0.0 sy,  0.0 ni, 41.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 56.5 us,  0.0 sy,  0.0 ni, 43.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 44.3 us,  2.3 sy,  0.0 ni, 53.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 727,
  "_shards": {
    "total": 100,
    "successful": 100,
    "skipped": 0,
    "failed": 0
  }
}
%Cpu(s):  1.1 us,  2.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  0.0 us,  1.1 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$

Run the same again while setting max_concurrent_shard_requests to 16

$ ./query.sh ?max_concurrent_shard_requests=16
Elasticsearch version 6.3.1

%Cpu(s): 53.4 us,  3.9 sy,  0.0 ni, 33.8 id,  8.8 wa,  0.0 hi,  0.0 si,  0.1 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  2.3 sy,  0.0 ni, 96.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

GET /_search?max_concurrent_shard_requests=16
{"query":{"match_all":{}}}

%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s): 78.2 us,  2.3 sy,  0.0 ni, 19.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
{
  "hits": 670830085,
  "took": 309,
  "_shards": {
    "total": 100,
    "successful": 100,
    "skipped": 0,
    "failed": 0
  }
}
%Cpu(s):  5.7 us,  2.3 sy,  0.0 ni, 92.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu(s):  1.1 us,  1.1 sy,  0.0 ni, 97.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
$

Summary:

Version	Query took (ms)	Peak user CPU usage	Extra query parameter
5.5.3	334	100 %
6.3.1	727	62 % (only 5 threads running on a 8-core CPU)
6.3.1	309	100 %	max_concurrent_shard_requests=16

Test scripts:

populate.sh:

#!/bin/bash

# Create index-1 with 10M dummy documents

for j in $(seq 0 10); do
        for i in $(seq 0 1000000); do echo -e '{ "index" : { "_index" : "index-1", "_type" : "doc" } }\n{ "message" : "Hello World!" }'; done | curl -s -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' --data-binary @- | jq '{errors: .errors, took: .took}'
done

# Clone index-1

for index in $(seq 2 20); do
        curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d"{\"source\": {\"index\": \"index-1\"}, \"dest\": {\"index\": \"index-$index\"}}" &
done

query.sh:

#!/bin/bash

echo -n "Elasticsearch version "
curl -s localhost:9200 | jq -r .version.number
echo

curl -s -X POST "localhost:9200/_cache/clear" > /dev/null

QUERY='{"query":{"match_all":{}}}'

top -d 0.1 -b | grep Cpu &

sleep 0.5
echo
echo "GET /_search$1"
echo $QUERY
echo
curl -s -X GET "localhost:9200/_search$1" -H 'Content-Type: application/json' -d $QUERY | jq "{hits: .hits.total, took: .took, _shards: ._shards}"
sleep 0.5

kill $(jobs -p)
wait

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-07-09T10:20:21Z

Pinging @elastic/es-search-aggs

ctrlaltdel · 2018-07-13T10:51:30Z

Pinging @s1monw @jpountz @jimczi

jpountz · 2018-07-13T11:58:33Z

@ctrlaltdel I see this as a feature rather than a bug. With that many shards on a single node, having this protection is important to keep things reasonable with many concurrent users. To me the problem is more the number of shards per node here.

ctrlaltdel · 2018-07-13T13:42:12Z

@jpountz thanks for the comment.

My typical use-case here is an ELK stack running on a single machine and serving as a search engine for syslog messages. There's usually only a single query running at a given time which is coming from a kibana dashboard.

In this case running on a 24 cores machine, typical query time is now about 5 times slower when running latest version of elasticsearch than it was with 5.5.3.

Could you please elaborate on how changing the number of shards here would improve performance? AFAIK, since #25632 was merged, a single query can only run on 5 threads (so use maximum 5 cores) per node by default. The only way to improve this would be to actually increase the number of shards per index.

Unfortunately there's now way to tell kibana to set the max_concurrent_shard_requests parameter by default and it makes use of _msearch queries that don't support it anyway.

s1monw · 2018-08-14T13:24:54Z

My typical use-case here is an ELK stack running on a single machine and serving as a search engine for syslog messages. There's usually only a single query running at a given time which is coming from a kibana dashboard.

I agree it's a shame you can't use all your resources here. This usecase isn't typical in the way that you don't gain concurrency via multiple requests. Think of 5 requests hitting your server at the same time you will have 25 requests at most hitting your node which is a good protection. Yet, your case want to maximize resource utilization per node and it's a shame that kibana can't trigger it. I wonder if we should allow multi search to override this and then make kibana expose it? @jpountz WDYT?

jpountz · 2018-08-14T14:10:01Z

+1

Today `_msearch` doesn't allow modifying the `max_concurrent_shard_requests` per sub search request. This change adds support for setting this parameter on all sub-search requests in an `_msearch`. Relates to elastic#31877

s1monw · 2018-08-21T07:51:04Z

I created a PR to expose this setting on _msearch and opened a kibana issue to expose it. I will keep this issue open until we have code committed in both elasticsearch and kibana.

ctrlaltdel · 2018-08-21T08:28:10Z

@s1monw looks good, thanks a lot for taking care of this issue :)

Today `_msearch` doesn't allow modifying the `max_concurrent_shard_requests` per sub search request. This change adds support for setting this parameter on all sub-search requests in an `_msearch`. Relates to #31877

s1monw · 2018-09-11T19:16:07Z

this has been integrated in kibana. I am closing this.

ctrlaltdel changed the title ~~Performance regression for search query in single node cluster since 5.6.0~~ Performance regression for search query in single node cluster Jul 6, 2018

colings86 added the :Search/Search Search-related issues that do not fall into other categories label Jul 9, 2018

s1monw self-assigned this Aug 20, 2018

This was referenced Aug 21, 2018

Expose max_concurrent_shard_requests in _msearch #33016

Merged

Allow setting max_concurrent_shard_requests for _msearch elastic/kibana#22206

Closed

s1monw closed this as completed Sep 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression for search query in single node cluster #31877

Performance regression for search query in single node cluster #31877

ctrlaltdel commented Jul 6, 2018

elasticmachine commented Jul 9, 2018

ctrlaltdel commented Jul 13, 2018

jpountz commented Jul 13, 2018

ctrlaltdel commented Jul 13, 2018

s1monw commented Aug 14, 2018

jpountz commented Aug 14, 2018

s1monw commented Aug 21, 2018

ctrlaltdel commented Aug 21, 2018

s1monw commented Sep 11, 2018

Performance regression for search query in single node cluster #31877

Performance regression for search query in single node cluster #31877

Comments

ctrlaltdel commented Jul 6, 2018

elasticmachine commented Jul 9, 2018

ctrlaltdel commented Jul 13, 2018

jpountz commented Jul 13, 2018

ctrlaltdel commented Jul 13, 2018

s1monw commented Aug 14, 2018

jpountz commented Aug 14, 2018

s1monw commented Aug 21, 2018

ctrlaltdel commented Aug 21, 2018

s1monw commented Sep 11, 2018