You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a situation where a replication group has at-least one docrep shard copy, failover from a remote primary to a remote replica fails with no retention lease for tracked shard
During dual replication phase, RetentionLeases generated on the primary shard, is synced over to the docrep copy through the RetentionLeaseBackgroundSyncAction, but we block the replication call to remote enabled replica copies. When the primary shard copy fails over to another remote enabled replica, the invariant() check fails.
This is how the code flows. During a failover, the activatePrimaryMode() method of ReplicationTracker is invoked
This enabled the primaryMode flag for the ReplicationTracker instance, updates global and local Ckp, creates retention lease for itself and runs the invariant() checks
The invariant() method checks for retention leases again all replicated shard copies. During dual replication all docrep shard copies are marked as replicated.
Describe the bug
In a situation where a replication group has at-least one docrep shard copy, failover from a remote primary to a remote replica fails with
no retention lease for tracked shard
During dual replication phase, RetentionLeases generated on the primary shard, is synced over to the docrep copy through the
RetentionLeaseBackgroundSyncAction
, but we block the replication call to remote enabled replica copies. When the primary shard copy fails over to another remote enabled replica, theinvariant()
check fails.This is how the code flows. During a failover, the
activatePrimaryMode()
method ofReplicationTracker
is invokedOpenSearch/server/src/main/java/org/opensearch/index/shard/IndexShard.java
Lines 784 to 790 in 7103e56
This enabled the
primaryMode
flag for theReplicationTracker
instance, updates global and local Ckp, creates retention lease for itself and runs theinvariant()
checksOpenSearch/server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java
Lines 1359 to 1364 in 7103e56
The
invariant()
method checks for retention leases again allreplicated
shard copies. During dual replication all docrep shard copies are marked as replicated.OpenSearch/server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java
Lines 958 to 975 in 7103e56
Since retention leases weren't copied over from the primary shard instance, the assertion trips here.
We need to re-create retention leases for docrep shard copies and hold off from invoking this assertion until the leases are created.
Related component
Storage:Remote
To Reproduce
N/A
Expected behavior
Failover from both remote primary to both docrep and remote replicas should work seamlessly during the dual replication phase
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: