Some combination of total_shards_per_node
and allow_rebalance
blocks allocation of unassigned shards with DesiredBalanceAllocator
#108594
Labels
Elasticsearch Version
8.12.1
Installed Plugins
No response
Java Version
bundled
OS Version
Linux 5.15.133+
Problem Description
Some combination of
total_shards_per_node
andallow_rebalance
(e.g.total_shards_per_node
= 2 andallow_rebalance
=indices_all_active
) blocks allocation of unassigned shards with DesiredBalanceAllocator. Some replica shards are remaining unassigned even there are room to allocate them. Cluster allocation API tells unassigned shards can be replaced to that room. This is really confusing situation.The response of explain API.
The response of
_internal/desired_balance
. It says unassigned shards is 0 even there are actually unassigned shards.I found this is due to the difference of the order of relocating and assigning unassigned between computing desired balance and actual allocation. During desired balance, the order is relocating -> assigning unassigned. But during actual allocation, the order is assigning unassigned -> relocating (balance).
Detail
This is the log. Now
total_shards_per_node
= 2. The NodeOx3uTG_uTX6RPFToXcNk5g
has 2 shards (9 and 20). 1 replica of shard 0 is unassigned.During computing desired balance, shard 20 was relocated to other node from
Ox3uTG_uTX6RPFToXcNk5g
.Then shard 0 was allocated to node
Ox3uTG_uTX6RPFToXcNk5g
because shard 20 was relocated and there is a room.Then delegatedAllocater assigned shard 0 to node
Ox3uTG_uTX6RPFToXcNk5g
.Then here is the computed desired balance. All primary and replicas of shard 0 are assigned to nodes.
But in reconciliation,
ShardsLimitAllocationDecider
returnsNO
because shard 20 was not relocated actually yet.And relocation was blocked by ClusterRebalanceAllocationDecider because there are still unassigned shards.
Then unassigned shards will never be assigned.
Steps to Reproduce
total_shards_per_node=2
to index that has many shards (in our case 24).Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: