Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some combination of total_shards_per_node and allow_rebalance blocks allocation of unassigned shards with DesiredBalanceAllocator #108594

Closed
mrkm4ntr opened this issue May 14, 2024 · 2 comments
Labels
>bug needs:triage Requires assignment of a team area label

Comments

@mrkm4ntr
Copy link
Contributor

Elasticsearch Version

8.12.1

Installed Plugins

No response

Java Version

bundled

OS Version

Linux 5.15.133+

Problem Description

Some combination of total_shards_per_node and allow_rebalance (e.g. total_shards_per_node = 2 and allow_rebalance = indices_all_active) blocks allocation of unassigned shards with DesiredBalanceAllocator. Some replica shards are remaining unassigned even there are room to allocate them. Cluster allocation API tells unassigned shards can be replaced to that room. This is really confusing situation.

The response of explain API.

{
  "index": "item-all",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2024-05-05T13:07:16.494Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "yes",
  "allocation_explanation": "Elasticsearch can allocate the shard.",
  "target_node": {
  ...
}

The response of _internal/desired_balance. It says unassigned shards is 0 even there are actually unassigned shards.

{
  "stats": {
    ...
    "unassigned_shards": 0,
    ...
}

I found this is due to the difference of the order of relocating and assigning unassigned between computing desired balance and actual allocation. During desired balance, the order is relocating -> assigning unassigned. But during actual allocation, the order is assigning unassigned -> relocating (balance).

Detail

This is the log. Now total_shards_per_node = 2. The Node Ox3uTG_uTX6RPFToXcNk5g has 2 shards (9 and 20). 1 replica of shard 0 is unassigned.

During computing desired balance, shard 20 was relocated to other node from Ox3uTG_uTX6RPFToXcNk5g.

T0509 05:22:58.000889 1 [elasticsearch-bench8-es-masters-z-b-1] [item-all][20] marked shard as started (routing: [item-all][20], node[gszxwYrMRUik4tvLE9ONTA], relocating [Ox3uTG_uTX6RPFToXcNk5g], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=m_lHqXcaQRe05AXDoOgqmA, rId=K5b7AIUXTI-QZakqyeDJZg], failed_attempts[0], expected_shard_size[0])

Then shard 0 was allocated to node Ox3uTG_uTX6RPFToXcNk5g because shard 20 was relocated and there is a room.

T0509 05:22:58.000895 1 [elasticsearch-bench8-es-masters-z-b-1] [item-all][0] marked shard as started (routing: [item-all][0], node[Ox3uTG_uTX6RPFToXcNk5g], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=UUfVDOQJRB2GuZE2GhShVw], unassigned_info[[reason=REPLICA_ADDED], at[2024-05-09T04:44:18.973Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0], expected_shard_size[0])

Then delegatedAllocater assigned shard 0 to node Ox3uTG_uTX6RPFToXcNk5g.

T0509 05:22:58.000929 1 [elasticsearch-bench8-es-masters-z-b-1] Assigned shard [[item-all][0], node[Ox3uTG_uTX6RPFToXcNk5g], [R], s[STARTED], a[id=UUfVDOQJRB2GuZE2GhShVw], failed_attempts[0], expected_shard_size[0]] to node [Ox3uTG_uTX6RPFToXcNk5g]

Then here is the computed desired balance. All primary and replicas of shard 0 are assigned to nodes.

T0509 05:22:59.000043 1 [elasticsearch-bench8-es-masters-z-b-1] Desired balance updated: ... [item-all][0]=ShardAssignment[nodeIds=[NkF1_R8nR_Kiym-LxTVMIA, EmeN_FSPSaGca_-zq858LA, Ox3uTG_uTX6RPFToXcNk5g], total=3, unassigned=0, ignored=0]

But in reconciliation, ShardsLimitAllocationDecider returns NO because shard 20 was not relocated actually yet.

T0509 05:22:59.000051 1 [elasticsearch-bench8-es-masters-z-b-1] Reconciler#allocateUnassigned
...
T0509 05:22:59.000051 1 [elasticsearch-bench8-es-masters-z-b-1] Can not allocate [[item-all][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=REPLICA_ADDED], at[2024-05-09T04:44:18.973Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]] on node [{elasticsearch-bench8-es-item-all-1}{Ox3uTG_uTX6RPFToXcNk5g}{7Skx_KKdTyWmVjXlNTEWJQ}{elasticsearch-bench8-es-item-all-1}{10.34.1.230}{10.34.1.230:9300}{d}{8.12.1}{7000099-8500010}{transform.config_version=10.0.0, xpack.installed=true, k8s_node_name=gke-citadel-2g-dev-t-d-mercari-eaas-t-f669dce3-k79p, k8s_pod_name=elasticsearch-bench8-es-item-all-1, group=item-all, ml.config_version=12.0.0}]. [ShardsLimitAllocationDecider]: NO()
...
D0509 05:22:59.000051 1 [elasticsearch-bench8-es-masters-z-b-1] Couldn't assign shard [[item-all][0]] to [Ox3uTG_uTX6RPFToXcNk5g]: NO()
...
D0509 05:22:59.000051 1 [elasticsearch-bench8-es-masters-z-b-1] No eligible node found to assign shard [[item-all][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=REPLICA_ADDED], at[2024-05-09T04:44:18.973Z], delayed=false, allocation_status[no_attempt]], failed_attempts[0]]

And relocation was blocked by ClusterRebalanceAllocationDecider because there are still unassigned shards.

T0509 05:22:59.000058 1 [elasticsearch-bench8-es-masters-z-b-1] Can not rebalance. [ClusterRebalanceAllocationDecider]: NO(the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active])

Then unassigned shards will never be assigned.

Steps to Reproduce

  1. Set total_shards_per_node=2 to index that has many shards (in our case 24).
  2. Increase the number of node.
  3. Increase the number of replicas as soon as possible.
  4. Some new shards are remaining unassigned.

Logs (if relevant)

No response

@mrkm4ntr mrkm4ntr added >bug needs:triage Requires assignment of a team area label labels May 14, 2024
@DaveCTurner
Copy link
Contributor

I believe #98710 would address the confusing output of the allocation explain API (ES cannot in fact allocate the shards to their desired nodes). However ultimately total_shards_per_node leads to unassigned shards sometimes, as mentioned in its docs. See also #12273.

Since this is a known issue and tracked elsewhere I'm going to close this as a duplicate. It's a valid observation, we just don't need another issue to track it.

@DaveCTurner DaveCTurner closed this as not planned Won't fix, can't repro, duplicate, stale May 14, 2024
@mrkm4ntr
Copy link
Contributor Author

mrkm4ntr commented May 14, 2024

@DaveCTurner We've never faced unassigned shards with total_shards_per_node before using DesiredBalanceAllocator for several years. And as my explanation, the cause is clearly DesiredBalanceAllocator. If you won't plan to fix this soon, at least I'd like you to add note to docs about using total_shards_per_node requires allow_rebalance is indices_primaries_active or always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug needs:triage Requires assignment of a team area label
Projects
None yet
Development

No branches or pull requests

2 participants