Cluster state update takes a long time due to lots of aliases #11100

msimos · 2015-05-11T20:36:48Z

When taking a snapshot or creating an index takes a very long time. For example when taking a snapshot takes 18 hours. The cluster has 23 nodes & 655 indexes with 400 alias per index. Each user is an alias (to an index). They track the current Index (in zookeeper). Once the current index has 400 alias, they create a new index. When looking at the pending tasks while the snapshot is running we see a lot of long run tasks like:

{
"insert_order" : 30872,
"priority" : "NORMAL",
"source" : "update snapshot state",
"executing" : false,
"time_in_queue_millis" : 62424008,
"time_in_queue" : "17.3h"
}

This is an example of the alias that is created:

"63e5786c2bea7e51a2d2359de2bc42419bf1c8e96d14c40336e7a0aa594dfb80" : {
    "filter" : {
        "term" : {
            "user_id" : "63e5786c2bea7e51a2d2359de2bc42419bf1c8e96d14c40336e7a0aa594dfb80"
        }
    },
    "index_routing" : "63e5786c2bea7e51a2d2359de2bc42419bf1c8e96d14c40336e7a0aa594dfb80",
    "search_routing" : "63e5786c2bea7e51a2d2359de2bc42419bf1c8e96d14c40336e7a0aa594dfb80"
},

According to Igor, publishing of cluster state updates always takes about 14-15 seconds no matter what type of update it is. Here is one example when updating aliases:

[03:23:37,861][DEBUG][cluster.service ] [Mad Dog Rassitano] processing [index-aliases]: execute
[03:23:40,144][DEBUG][cluster.service ] [Mad Dog Rassitano] cluster state updated, version [39957], source [index-aliases]
[03:23:40,144][DEBUG][cluster.service ] [Mad Dog Rassitano] publishing cluster state version 39957
[03:23:55,545][DEBUG][cluster.service ] [Mad Dog Rassitano] set local cluster state to version 39957
[03:23:55,554][DEBUG][cluster.service ] [Mad Dog Rassitano] processing [index-aliases]: done applying updated cluster_state (version: 39957)

Here is another example of a node leaving the cluster:

[07:37:21,822][DEBUG][cluster.service ] [Mad Dog Rassitano] processing [zen-disco-node_failed([Death-Stalker][l045BOE8RWC3Wm6DDZJERA][polaris-prod-135-w-esnode-9t65fuj427hgh][inet[/10.0.0.67:9300]]{master=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout]: execute
[07:37:58,920][DEBUG][cluster.service ] [Mad Dog Rassitano] cluster state updated, version [40189], source [zen-disco-node_failed([Death-Stalker][l045BOE8RWC3Wm6DDZJERA][polaris-prod-135-w-esnode-9t65fuj427hgh][inet[/10.0.0.67:9300]]{master=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout]
[07:37:58,921][INFO ][cluster.service ] [Mad Dog Rassitano] removed {[Death-Stalker][l045BOE8RWC3Wm6DDZJERA][polaris-prod-135-w-esnode-9t65fuj427hgh][inet[/10.0.0.67:9300]]{master=false},}, reason: zen-disco-node_failed([Death-Stalker][l045BOE8RWC3Wm6DDZJERA][polaris-prod-135-w-esnode-9t65fuj427hgh][inet[/10.0.0.67:9300]]{master=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout
[07:37:58,921][DEBUG][cluster.service ] [Mad Dog Rassitano] publishing cluster state version 40189
[07:38:12,901][DEBUG][cluster.service ] [Mad Dog Rassitano] set local cluster state to version 40189
[07:38:12,942][DEBUG][cluster.service ] [Mad Dog Rassitano] processing [zen-disco-node_failed([Death-Stalker][l045BOE8RWC3Wm6DDZJERA][polaris-prod-135-w-esnode-9t65fuj427hgh][inet[/10.0.0.67:9300]]{master=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout]: done applying updated cluster_state (version: 40189)

Slow cluster state propagation causes issues with index creation and slow snapshots.

The hot threads provided shows a lot of activity in reading cluster state from the network with a majority of the hot threads being busy reading aliases:

threads-0007: 79.9% (399.7ms out of 500ms) cpu usage by thread 'elasticsearch[Dormammu][transport_client_worker][T#2]{New I/O worker #2}'
threads-0007- 4/10 snapshots sharing following 31 elements
threads-0007- org.elasticsearch.common.hppc.ObjectObjectOpenHashMap.get(ObjectObjectOpenHashMap.java:564)
threads-0007- org.elasticsearch.common.collect.ImmutableOpenMap$Builder.get(ImmutableOpenMap.java:255)
threads-0007- org.elasticsearch.cluster.metadata.MetaData.<init>(MetaData.java:222)
threads-0007- org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1245)
threads-0007- org.elasticsearch.cluster.metadata.MetaData$Builder.readFrom(MetaData.java:1382)
threads-0007- org.elasticsearch.cluster.ClusterState$Builder.readFrom(ClusterState.java:620)
threads-0007- org.elasticsearch.action.admin.cluster.state.ClusterStateResponse.readFrom(ClusterStateResponse.java:58)
threads-0007- org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:150)
threads-0007- org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:127)
threads-0007- org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
threads-0007- org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
threads-0007- org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
threads-0007- org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
threads-0007- org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
threads-0007- org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
threads-0007- org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
threads-0007- org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
threads-0007- org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
threads-0007- org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)

threads-0008: 88.7% (443.7ms out of 500ms) cpu usage by thread 'elasticsearch[Tower][clusterService#updateTask][T#1]'
threads-0008- 2/10 snapshots sharing following 11 elements
threads-0008- org.elasticsearch.common.hppc.ObjectObjectOpenHashMap.get(ObjectObjectOpenHashMap.java:564)
threads-0008- org.elasticsearch.common.collect.ImmutableOpenMap$Builder.get(ImmutableOpenMap.java:255)
threads-0008- org.elasticsearch.cluster.metadata.MetaData.<init>(MetaData.java:222)
threads-0008- org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1245)
threads-0008- org.elasticsearch.cluster.ClusterState$Builder.metaData(ClusterState.java:525)
threads-0008- org.elasticsearch.discovery.zen.ZenDiscovery$8.execute(ZenDiscovery.java:690)
threads-0008- org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:328)
threads-0008- org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
threads-0008- java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
threads-0008- java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
threads-0008- java.lang.Thread.run(Unknown Source)
threads-0008- 4/10 snapshots sharing following 12 elements
threads-0008- org.elasticsearch.common.hppc.AbstractIterator.hasNext(AbstractIterator.java:34)
threads-0008- org.elasticsearch.common.hppc.AbstractObjectCollection.toArray(AbstractObjectCollection.java:86)
threads-0008- org.elasticsearch.common.hppc.ObjectArrayList.toArray(ObjectArrayList.java:44)
threads-0008- org.elasticsearch.cluster.metadata.MetaData.<init>(MetaData.java:232)
threads-0008- org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1245)
threads-0008- org.elasticsearch.cluster.ClusterState$Builder.metaData(ClusterState.java:525)
threads-0008- org.elasticsearch.discovery.zen.ZenDiscovery$8.execute(ZenDiscovery.java:690)

threads-0008: 100.0% (500ms out of 500ms) cpu usage by thread 'elasticsearch[Freak][transport_server_worker][T#6]{New I/O worker #15}'
threads-0008- 6/10 snapshots sharing following 33 elements
threads-0008- org.elasticsearch.cluster.metadata.MetaData.<init>(MetaData.java:205)
threads-0008- org.elasticsearch.cluster.metadata.MetaData$Builder.build(MetaData.java:1245)
threads-0008- org.elasticsearch.cluster.metadata.MetaData$Builder.readFrom(MetaData.java:1382)
threads-0008- org.elasticsearch.cluster.ClusterState$Builder.readFrom(ClusterState.java:620)
threads-0008- org.elasticsearch.discovery.zen.publish.PublishClusterStateAction$PublishClusterStateRequestHandler.messageReceived(PublishClusterStateAction.java:175)
threads-0008- org.elasticsearch.discovery.zen.publish.PublishClusterStateAction$PublishClusterStateRequestHandler.messageReceived(PublishClusterStateAction.java:156)
threads-0008- org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:217)
threads-0008- org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
threads-0008- org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
threads-0008- org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
threads-0008- org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
threads-0008- org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
threads-0008- org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
threads-0008- org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
threads-0008- org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
threads-0008- org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
threads-0008- org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
threads-0008- org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
threads-0008- org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)

I realize this issue is related to these two issues:

#10212
#10295

However I am looking for suggestions as how to improve the performance of the cluster state propagation.

The text was updated successfully, but these errors were encountered:

clintongormley · 2015-05-15T16:22:25Z

Hi @msimos

I've marked this for discussion, but I think the outcome will be: you're using too many aliases. These aliases are consuming a big chunk of memory on every node in your cluster, and making cluster state updates very heavy. Even with cluster state diffs, it's still going to be slow. I'd really consider adopting another plan which doesn't involve that many aliases (eg just adding the filter and routing values dynamically on to each request via your application)

msimos · 2015-05-15T22:08:45Z

Hi @clintongormley,

This is what we recommended. So I'll close this now.

Mike

msimos changed the title ~~Snapshot & Index creation takes a long time~~ Cluster state update takes a long time due to lots of aliases May 11, 2015

clintongormley added discuss :Aliases labels May 15, 2015

msimos closed this as completed May 15, 2015

kevinkluge removed the discuss label May 15, 2015

clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Aliases labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster state update takes a long time due to lots of aliases #11100

Cluster state update takes a long time due to lots of aliases #11100

msimos commented May 11, 2015

clintongormley commented May 15, 2015

msimos commented May 15, 2015

Cluster state update takes a long time due to lots of aliases #11100

Cluster state update takes a long time due to lots of aliases #11100

Comments

msimos commented May 11, 2015

clintongormley commented May 15, 2015

msimos commented May 15, 2015