Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Fix test testDropPrimaryDuringReplication and clean up ReplicationCheckpoint validation #9095

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

mch2
Copy link
Member

@mch2 mch2 commented Aug 3, 2023

Manual backport of #8889

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git]
Components skipped due to git failures: [https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 24m 25s

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git]
Components skipped due to git failures: [https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 25m 9s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Gradle Check (Jenkins) Run Completed with:

@mch2
Copy link
Member Author

mch2 commented Aug 3, 2023

Gradle Check (Jenkins) Run Completed with:

test missing on main testUploadMetadataMissingSegment is now failing on 2.x

…ckpoint validation (opensearch-project#8889)

* Fix test testDropPrimaryDuringReplication and clean up ReplicationCheckpoint validation.

This test is now occasionally failing with replicas having 0 documents. This occurs in a couple of ways:
1. After dropping the old primary the new primary is not publishing a checkpoint to replicas unless it indexes docs from translog after flipping to primary mode.
If there is nothing to index, it will not publish a checkpoint, but the other replica could have never sync'd with the original primary and be left out of date.
- This PR fixes this by force publishing a checkpoint after the new primary flips to primary mode.
2. The replica receives a checkpoint post failover and cancels its sync with the former primary that is still active, recognizing a primary term bump.
However this cancellation is async and immediately starting a new replication event could fail as its still replicating.
- This PR fixes this by attempting to process the latest received checkpoint on failure, if the shard is not failed and still behind.

This PR also introduces a few changes to ensure the accuracy of the ReplicationCheckpoint tracked on primary & replicas.
- Ensure the checkpoint stored in SegmentReplicationTarget is the checkpoint passed from the primary and not locally computed.  This ensures checks for primary term are accurate and not using a locally compued operationPrimaryTerm.
- Introduces a refresh listener for both primary & replica to update the ReplicationCheckpoint and store it in replicationTracker post refresh rather than redundantly computing when accessed.
- Removes unnecessary onCheckpointPublished method used to start replication timers manually.  This will happen automatically on primaries once its local cp is updated.

Signed-off-by: Marc Handalian <[email protected]>

* Handle NoSuchFileException when attempting to delete decref'd files.

To avoid divergent logic with remote store, we always incref/decref the segmentinfos.files(true) which includes the segments_n file.
Decref to 0 will attempt to delete the file from the store and its possible this _n file does not yet exist. This change will ignore if we get a noSuchFile while attempting to delete.

Signed-off-by: Marc Handalian <[email protected]>

* Add more unit tests.

Signed-off-by: Marc Handalian <[email protected]>

* Clean up IndexShardTests.testCheckpointReffreshListenerWithNull

Signed-off-by: Marc Handalian <[email protected]>

* Remove unnecessary catch for NoSuchFileException.

Signed-off-by: Marc Handalian <[email protected]>

* Add another test for non segrep.

Signed-off-by: Marc Handalian <[email protected]>

* PR Feedback.

Signed-off-by: Marc Handalian <[email protected]>

* re-compute replication checkpoint on primary promotion.

Signed-off-by: Marc Handalian <[email protected]>

---------

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit c3acf47)
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git]
Components skipped due to git failures: [https://github.com/opensearch-project/opensearch-oci-object-storage.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

BUILD SUCCESSFUL in 31m 15s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.snapshots.CloneSnapshotIT.testCloneShallowSnapshotIndex

@codecov
Copy link

codecov bot commented Aug 3, 2023

Codecov Report

Merging #9095 (e714442) into 2.x (ae9373a) will increase coverage by 0.21%.
The diff coverage is 69.23%.

@@             Coverage Diff              @@
##                2.x    #9095      +/-   ##
============================================
+ Coverage     70.66%   70.88%   +0.21%     
- Complexity    57341    57486     +145     
============================================
  Files          4754     4754              
  Lines        271650   271673      +23     
  Branches      40067    40070       +3     
============================================
+ Hits         191975   192576     +601     
+ Misses        63167    62605     -562     
+ Partials      16508    16492      -16     
Files Changed Coverage Δ
...search/index/shard/RemoteStoreRefreshListener.java 82.60% <0.00%> (-2.01%) ⬇️
...ckpoint/SegmentReplicationCheckpointPublisher.java 100.00% <ø> (ø)
...s/replication/SegmentReplicationTargetService.java 60.00% <57.69%> (+0.74%) ⬆️
.../opensearch/index/engine/NRTReplicationEngine.java 77.71% <66.66%> (-0.40%) ⬇️
.../indices/replication/SegmentReplicationTarget.java 89.71% <85.71%> (+0.39%) ⬆️
...in/java/org/opensearch/index/shard/IndexShard.java 68.60% <86.95%> (-0.30%) ⬇️

... and 491 files with indirect coverage changes

@mch2 mch2 merged commit b3c8a94 into opensearch-project:2.x Aug 3, 2023
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants