Add global and index level blocks to IndexSettings #35695

tlrx · 2018-11-19T15:15:37Z

This pull request adds a new getIndexBlocks() method to the IndexSettings class. This method returns a ClusterBlocks object that can be used to retrieve the current global level and index level blocks set on the IndexShard object which the index settings belongs to.

While the purpose of such method has been discussed via another channel few weeks ago, it resurfaced recently after the merge of #35332 in which we added check for global/index blocks in the primary action of transport replication actions. This change caused some tests to fail on CI (see #35597): the TransportResyncReplicationAction failed and the replica was never promoted to primary before the test timed out. The resync failed because the primary action in TransportResyncReplicationAction checks blocks using the cluster state from the ClusterService, which is not yet updated and in the case of this tests still contains a global "no master" block, whereas it should check blocks against the blocks from the incoming cluster state that is not yet applied.

This pull request changes the IndicesClusterStateService so that blocks are updated and propagated to the IndexSettings. After #35332 has been recomited again (it was reverted to allow CI to pass), a follow up PR will change how blocks are checked in TransportReplicationAction so that it uses blocks from indexShard.indexSettings().getIndexBlocks().

elasticmachine · 2018-11-19T15:15:39Z

Pinging @elastic/es-distributed

bleskes · 2018-11-20T08:56:44Z

@tlrx can you elaborate more about the failure? sampling the cluster state may indeed not reflect the cluster state that has been used to update the shards what are the steps for that being a problem? I looked at the linked issue and I don't see clear description there either.

tlrx · 2018-11-20T09:27:28Z

@bleskes sorry, I should have written down what Yannick and I discussed via another channel.

The test that failed in #35597 is a rolling upgrade test of a 3 nodes cluster where each node is upgraded sequentially and tests executed after each node upgrade. During the upgrade of a node the cluster has no master and the cluster state contains a "no master" global block.

At some point during the test, a replica shard is promoted to primary. The incoming cluster state is processed by the IndicesClusterStateService which will update the internal datastructures: IndicesClusterStateService.createOrUpdateShards() updates the shard state using IndexShard.updateShardState(). The replica to primary promotion is detected and a primary term bump is executed, which in turns triggers a primary-replica synchronization (PrimaryReplicaSyncer).

The primary-replica synchronization is executed using a TransportResyncReplicationAction which is a transport replication write action that skip the reroute phase and directly executes the primary action. Because #35332 added a block check in the primary action that uses the ClusterService's state, the visible cluster state at that point stills contains the "no master" block because the new incoming cluster state which removes this blocks (the cluster state processed by IndicesClusterStateService) is not visible/applied yet. Since a "no master" block is present in the ClusterService's state, the resync action is blocked and fails, and in consequence the whole shard is failed too.

Before #35332 and because the resync action skips the reroute phase, the blocks were just never checked.

I talked to @ywelsch about this and it reminded us a conversation we had weeks ago when we discuss the necessity to make cluster blocks from the incoming cluster state visible to the IndexShard datastructure for this exact purpose. We decided to make them visible through the IndexSettings as it's already done for IndexMetaData.

bleskes · 2018-11-20T16:18:58Z

Thank you @tlrx. I agree that it will be good to expose the blocks from something the shard owns. That said, I think we need to be careful about what they mean exactly and when we expose them. For example - I think index settings are updated before the shard updates it's internal state when a setting is changes. We want to be careful not to re-introduce the cluster-state-from-the-future problem we used to have where we have to worry about cluster state not being applied after we sample it.

Did you guys discuss these semantics?

Finally - regarding the specific issue here - I wonder if resync should honour blocks - it's a tricky thing - we are making sure shards are in sync which is a good thing, even if the user marks the index as read only. We don't really need the master block as we validate the primary terms. Am i missing something?

tlrx · 2018-11-21T20:37:12Z

We talked via another channel and we decided to keep the current behavior for now in order to not reintroduce a cluster-state-from-the-future problem and not break the current semantic. We decided to make it more obvious that the resync action should not be blocked at all because of its internal nature, and I opened #35795 for this.

Add global and index level blocks to IndexSettings

8413466

tlrx added >enhancement v7.0.0 :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v6.6.0 labels Nov 19, 2018

tlrx requested review from bleskes and ywelsch November 19, 2018 15:15

tlrx mentioned this pull request Nov 20, 2018

Rolling-upgrade RecoveryIT tests are broken #35597

Closed

tlrx closed this Nov 21, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

tlrx deleted the inject-blocks-in-indexsettings branch February 12, 2019 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add global and index level blocks to IndexSettings #35695

Add global and index level blocks to IndexSettings #35695

tlrx commented Nov 19, 2018

elasticmachine commented Nov 19, 2018

bleskes commented Nov 20, 2018

tlrx commented Nov 20, 2018

bleskes commented Nov 20, 2018

tlrx commented Nov 21, 2018

Add global and index level blocks to IndexSettings #35695

Add global and index level blocks to IndexSettings #35695

Conversation

tlrx commented Nov 19, 2018

elasticmachine commented Nov 19, 2018

bleskes commented Nov 20, 2018

tlrx commented Nov 20, 2018

bleskes commented Nov 20, 2018

tlrx commented Nov 21, 2018