Use index for peer recovery instead of translog #45137

This creates a peer-recovery retention lease for every shard during recovery, ensuring that the replication group retains history for future peer recoveries. It also ensures that leases for active shard copies do not expire, and leases for inactive shard copies expire immediately if the shard is fully-allocated. Relates elastic#41536

This commit adjusts the behaviour of the retention lease sync to first renew any peer-recovery retention leases where either: - the corresponding shard's global checkpoint has advanced, or - the lease is older than half of its expiry time Relates elastic#41536

This commit updates the version in which PRRLs are expected to exist to 7.4.0.

If the primary performs a file-based recovery to a node that has (or recently had) a copy of the shard then it is possible that the persisted global checkpoint of the new copy is behind that of the old copy since file-based recoveries are somewhat destructive operations. Today we leave that node's PRRL in place during the recovery with the expectation that it can be used by the new copy. However this isn't the case if the new copy needs more history to be retained, because retention leases may only advance and never retreat. This commit addresses this by removing any existing PRRL during a file-based recovery: since we are performing a file-based recovery we have already determined that there isn't enough history for an ops-based recovery, so there is little point in keeping the old lease in place. Caught by [a failure of `RecoveryWhileUnderLoadIT.testRecoverWhileRelocating`](https://scans.gradle.com/s/wxccfrtfgjj3g/console-log?task=:server:integTest#L14) Relates elastic#41536

Today we perform `TransportReplicationAction` derivatives during recovery, and these actions call their response handlers on the transport thread. This change moves the continued execution of the recovery back onto the generic threadpool.

Today when renewing PRRLs we assert that any invalid "backwards" renewals must be because we are recovering the shard. In fact it's also possible to have `checkpointState.globalCheckpoint == SequenceNumbers.UNASSIGNED_SEQ_NO` on a tracked shard copy if the primary was just promoted and hasn't received checkpoints from all of its peers too. This commit weakens the assertion to match. Caught by a [failure of the full cluster restart tests](https://scans.gradle.com/s/5lllzgqtuegty/console-log#L8605) Relates elastic#41536

In elastic#44000 we introduced some calls to `assertNotTransportThread` that are executed whether assertions are enabled or not. Although they have no effect if assertions are disabled, we should have done it like this instead.

Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates elastic#41536

The cluster in the full-cluster restart test only has 2 nodes, so we cannot fully allocate an index with 2 replicas.

The full cluster restart tests run against versions prior to 7.0 in which soft deletes are disabled by default, and against versions prior to 6.5 in which soft deletes are not even supported. This commit adjusts the PRRL full cluster restart test to handle such old clusters properly.

…43463) Today we use the local checkpoint of the safe commit on replicas as the starting sequence number of operation-based peer recovery. While this is a good choice due to its simplicity, we need to share this information between copies if we use retention leases in peer recovery. We can avoid this extra work if we use the global checkpoint as the starting sequence number. With this change, we will try to recover replica locally up to the global checkpoint before performing peer recovery. This commit should also increase the chance of operation-based recovery.

… step (elastic#44781) If we force allocate an empty or stale primary, the global checkpoint on replicas might be higher than the primary's as the local recovery step (introduced in elastic#43463) loads the previous (stale) global checkpoint into ReplicationTracker. There's no issue with the retention leases for a new lease with a higher term will supersede the stale one. Relates elastic#43463

For closed and frozen indices, we should not recover shard locally up to the global checkpoint before performing peer recovery for that copy might be offline when the index was closed/frozen. Relates elastic#43463 Closes elastic#44855

Thanks to peer recovery retention leases we now retain the history needed to perform peer recoveries from the index instead of from the translog. This commit adjusts the peer recovery process to do so, and also adjusts it to use the existence of a retention lease to decide whether or not to attempt an operations-based recovery. Reverts elastic#38904 and elastic#42211 Relates elastic#41536

Previously, if the metadata snapshot is empty (either no commit found or error), we won't compute the starting sequence number and use -2 to opt out the operation-based recovery. With elastic#43463, we have a starting sequence number before reading the last commit. Thus, we need to reset it if we fail to snapshot the store. Closes elastic#45072

Relates elastic#44887

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use index for peer recovery instead of translog #45137

Use index for peer recovery instead of translog #45137

Commits on Jun 19, 2019

Commits on Jun 21, 2019

Commits on Jun 24, 2019

Commits on Jun 26, 2019

Commits on Jul 1, 2019

Commits on Jul 4, 2019

Commits on Jul 5, 2019

Commits on Jul 8, 2019

Commits on Jul 9, 2019

Commits on Jul 11, 2019

Commits on Jul 15, 2019

Commits on Jul 20, 2019

Commits on Jul 23, 2019

Commits on Jul 24, 2019

Commits on Jul 29, 2019

Commits on Jul 30, 2019

Commits on Jul 31, 2019

Commits on Aug 1, 2019

Commits on Aug 2, 2019