Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport 2.x] Fix flaky test SegmentReplicationWithNodeToNodeIndexShardTests#testReplicaClosesWhileReplicating_AfterGetCheckpoint #12741

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

opensearch-trigger-bot[bot]
Copy link
Contributor

Backport 5e2034c from #12695.

Copy link
Contributor

github-actions bot commented Mar 18, 2024

Compatibility status:

Checks if related components are compatible with change 9078210

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/sql.git]

Copy link
Contributor

❌ Gradle check result for 215be52: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross
Copy link
Member

@mch2 This test failed in https://build.ci.opensearch.org/job/gradle-check/35236/consoleText with the following error:

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.shard.SegmentReplicationWithNodeToNodeIndexShardTests.testReplicaClosesWhileReplicating_AfterGetCheckpoint" -Dtests.seed=4060F9D0B1F3D9D8 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=America/Argentina/Salta -Druntime.java=21

org.opensearch.index.shard.SegmentReplicationWithNodeToNodeIndexShardTests > testReplicaClosesWhileReplicating_AfterGetCheckpoint FAILED
    java.lang.AssertionError: Should have resolved listener with failure expected:<0> but was:<1>
        at __randomizedtesting.SeedInfo.seed([4060F9D0B1F3D9D8:DD0E4A2AD34C80FC]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.opensearch.index.shard.SegmentReplicationIndexShardTests.startReplicationAndAssertCancellation(SegmentReplicationIndexShardTests.java:1067)
        at org.opensearch.index.shard.SegmentReplicationWithNodeToNodeIndexShardTests.testReplicaClosesWhileReplicating_AfterGetCheckpoint(SegmentReplicationWithNodeToNodeIndexShardTests.java:141)

Does that mean this fix doesn't actually fix the flaky test?

@mch2
Copy link
Member

mch2 commented Mar 20, 2024

this fails because #12043 was not backported to 2.x that includes changes to make cancel synchronous on replicas. Triggering backport on that first then will rebase this.

…plicaClosesWhileReplicating_AfterGetCheckpoint (#12695)

This fixes a race condition in the test where the primary shard will still have an open file ref while shutting down.
This happens because we are fetching file refs inside the resolveCheckpointInfoResponseListener method right  after calling beforeIndexShardClosed.
BeforeIndexShardClosed will resolve replication listeners immediately and leave a possibility
of the primary attempting shut down before those refs are closed. We can resolve this using latches, but this test really doesn't need to simulate a primary response at all so removed it entirely.

Signed-off-by: Marc Handalian <[email protected]>
(cherry picked from commit 5e2034c)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@mch2 mch2 force-pushed the backport/backport-12695-to-2.x branch from 215be52 to 9078210 Compare March 20, 2024 23:29
Copy link
Contributor

❌ Gradle check result for 9078210: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mch2
Copy link
Member

mch2 commented Mar 21, 2024

FAILURE

#12836

Copy link
Contributor

❕ Gradle check result for 9078210: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=indices.get_field_mapping/20_missing_field/Return empty object if field doesn't exist, but index does}

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@mch2 mch2 merged commit d172f3f into 2.x Mar 22, 2024
27 checks passed
@github-actions github-actions bot deleted the backport/backport-12695-to-2.x branch March 22, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants