Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test IndexShardTests.testCommitLevelRestoreShardFromRemoteStore #14418

Merged
merged 1 commit into from
Jun 18, 2024

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Jun 18, 2024

Description

  • In IndexShardTests.testCommitLevelRestoreShardFromRemoteStore, we index some docs, trigger refresh and induce failure in local store by deleting segment files.
  • If subsequent refresh or retry of earlier refresh is triggered while segment deletion or shard.close is in progress, it is possible that the segment file is still being referenced by RemoteStoreRefreshListener.
  • To avoid this issue, in this PR, we make sure to acquire permits on RemoteStoreRefreshListener.
  • The flaky tests mentioned under Related Issues ran successfully for 5000+ times on local with this change.
  • Sample stacktrace from the test failures
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 1 open files: {_0.cfs=1}
	at __randomizedtesting.SeedInfo.seed([87AFB01F83317A5B:97D1B6F179D0CFDE]:0)
	at org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:876)
	at org.apache.lucene.store.FilterDirectory.close(FilterDirectory.java:111)
	at org.apache.lucene.store.FilterDirectory.close(FilterDirectory.java:111)
	at org.opensearch.index.store.Store$StoreDirectory.innerClose(Store.java:952)
	at org.opensearch.index.store.Store.closeInternal(Store.java:571)
	at org.opensearch.index.store.Store$1.closeInternal(Store.java:194)
	at org.opensearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:78)
	at org.opensearch.index.store.Store.decRef(Store.java:546)
	at org.opensearch.index.store.Store.close(Store.java:553)
	at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
	at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
	at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:81)
	at org.opensearch.index.shard.IndexShardTestCase.closeShard(IndexShardTestCase.java:986)

Related Issues

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❕ Gradle check result for 5e96d49: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testReplicaAlreadyAtCheckpoint

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jun 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.65%. Comparing base (b15cb0c) to head (5e96d49).
Report is 442 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14418      +/-   ##
============================================
+ Coverage     71.42%   71.65%   +0.23%     
- Complexity    59978    62040    +2062     
============================================
  Files          4985     5118     +133     
  Lines        282275   291833    +9558     
  Branches      40946    42180    +1234     
============================================
+ Hits         201603   209114    +7511     
- Misses        63999    65510    +1511     
- Partials      16673    17209     +536     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sachinpkale sachinpkale added the backport 2.x Backport to 2.x branch label Jun 18, 2024
@gbbafna gbbafna merged commit 3a0c0c0 into opensearch-project:main Jun 18, 2024
57 of 60 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 18, 2024
…tore (#14418)

Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 3a0c0c0)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Jun 18, 2024
…tore (#14418) (#14422)

(cherry picked from commit 3a0c0c0)

Signed-off-by: Sachin Kale <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
harshavamsi pushed a commit to harshavamsi/OpenSearch that referenced this pull request Jul 12, 2024
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Jul 24, 2024
…tore (opensearch-project#14418) (opensearch-project#14422)

(cherry picked from commit 3a0c0c0)

Signed-off-by: Sachin Kale <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: kkewwei <[email protected]>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants