[BUG] Close index leads to temporary red cluster until shard has started #16016

ashking94 · 2024-09-20T09:11:31Z

Describe the bug

As of today, when an index is closed, it makes the cluster red temporarily until the shard has started. I am able to see this issue in both conventional document replication cluster as well as remote store enabled clusters.

Logs on a remote store enabled cluster

opensearch-master1  | [2024-09-20T08:55:03,100][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-master1  | [2024-09-20T08:55:03,138][INFO ][o.o.c.m.MetadataCreateIndexService] [opensearch-master1] [index1] creating index, cause [api], templates [], shards [1]/[0]
opensearch-master1  | [2024-09-20T08:55:03,141][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:03,148][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-node1    | [2024-09-20T08:55:03,219][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-node1    | [2024-09-20T08:55:03,357][INFO ][o.o.i.t.RemoteFsTranslog ] [opensearch-node1] [index1][0] Downloaded data from remote translog till maxSeqNo = -1
opensearch-node1    | [2024-09-20T08:55:03,381][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,381][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,382][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,382][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-master1  | [2024-09-20T08:55:03,385][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index1][0]]]).
opensearch-master1  | [2024-09-20T08:55:03,388][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-node1    | [2024-09-20T08:55:03,457][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,457][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=InternalEngine recoverySourceType=EMPTY_STORE primary=true
opensearch-node1    | [2024-09-20T08:55:03,458][INFO ][o.o.i.s.RemoteStoreRefreshListener] [opensearch-node1] [index1][0] Scheduled retry with didRefresh=true
opensearch-master1  | [2024-09-20T08:55:03,482][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:08,198][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-master1  | [2024-09-20T08:55:08,229][INFO ][o.o.c.m.MetadataMappingService] [opensearch-master1] [index1/h71N_-WHQcWNEjqbFctFJQ] create_mapping
opensearch-master1  | [2024-09-20T08:55:08,230][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-master1  | [2024-09-20T08:55:22,939][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] closing indices [index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-master1  | [2024-09-20T08:55:22,940][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-master1  | [2024-09-20T08:55:23,006][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] completed closing of indices [index1]
opensearch-master1  | [2024-09-20T08:55:23,007][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:23,010][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-master1  | [2024-09-20T08:55:23,073][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T08:55:23,075][INFO ][o.o.g.G.RemotePersistedState] [opensearch-master1] codec version is 4
opensearch-node1    | [2024-09-20T08:55:23,140][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/h71N_-WHQcWNEjqbFctFJQ]
opensearch-node1    | [2024-09-20T08:55:23,183][INFO ][o.o.i.s.IndexShard       ] [opensearch-node1] [index1][0] Downloaded translog and checkpoint files from=8 to=10
opensearch-node1    | [2024-09-20T08:55:23,207][INFO ][o.o.i.t.RemoteFsTranslog ] [opensearch-node1] [index1][0] Downloaded translog and checkpoint files from=8 to=10
opensearch-node1    | [2024-09-20T08:55:23,209][INFO ][o.o.i.t.RemoteFsTranslog ] [opensearch-node1] [index1][0] Downloaded data from remote translog till maxSeqNo = -1
opensearch-master1  | [2024-09-20T08:55:23,231][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[index1][0]]]).

Logs on doc rep clusters

opensearch-master1  | [2024-09-20T09:00:23,777][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:23,821][INFO ][o.o.c.m.MetadataCreateIndexService] [opensearch-master1] [index1] creating index, cause [api], templates [], shards [1]/[0]
opensearch-master1  | [2024-09-20T09:00:23,825][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-node1    | [2024-09-20T09:00:23,882][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:24,033][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index1][0]]]).
opensearch-master1  | [2024-09-20T09:00:24,096][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T09:00:26,347][INFO ][o.o.p.PluginsService     ] [opensearch-master1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:26,378][INFO ][o.o.c.m.MetadataMappingService] [opensearch-master1] [index1/G9Qow6fDROaCVD65DX-n0w] create_mapping
opensearch-master1  | [2024-09-20T09:00:42,889][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] closing indices [index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:42,949][INFO ][o.o.c.m.MetadataIndexStateService] [opensearch-master1] completed closing of indices [index1]
opensearch-master1  | [2024-09-20T09:00:42,949][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-master1  | [2024-09-20T09:00:43,008][WARN ][o.o.c.r.a.AllocationService] [opensearch-master1] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
opensearch-node1    | [2024-09-20T09:00:43,061][INFO ][o.o.p.PluginsService     ] [opensearch-node1] PluginService:onIndexModule index:[index1/G9Qow6fDROaCVD65DX-n0w]
opensearch-master1  | [2024-09-20T09:00:43,097][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[index1][0]]]).

This problem may be aggravated in remote store enabled cluster due to existing behaviour where the translog is downloaded from remote store. This, however, is being fixed now.

Related component

Cluster Manager

To Reproduce

Create an index
Ingest some docs
Close the index

Expected behavior

I am not very sure if the cluster should really turn red here or not. This gives a false sense of underlying issue that may be causing red cluster. IMHO the cluster should remain green during the close index is happening.

Additional Details

NA

The text was updated successfully, but these errors were encountered:

shwetathareja · 2024-09-24T14:05:23Z

opensearch-master1 | [2024-09-20T09:00:43,097][INFO ][o.o.c.r.a.AllocationService] [opensearch-master1] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[index1][0]]]).

Curious why the shard was started after closed index. May be some race in shard allocation.

ashking94 added bug Something isn't working untriaged labels Sep 20, 2024

github-actions bot added the Cluster Manager label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Close index leads to temporary red cluster until shard has started #16016

[BUG] Close index leads to temporary red cluster until shard has started #16016

ashking94 commented Sep 20, 2024

shwetathareja commented Sep 24, 2024

[BUG] Close index leads to temporary red cluster until shard has started #16016

[BUG] Close index leads to temporary red cluster until shard has started #16016

Comments

ashking94 commented Sep 20, 2024

Describe the bug

Logs on a remote store enabled cluster

Logs on doc rep clusters

Related component

To Reproduce

Expected behavior

Additional Details

shwetathareja commented Sep 24, 2024