Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testRequestStats is flaky #10735

Closed
gbbafna opened this issue Oct 19, 2023 · 11 comments · Fixed by #10736, #13814 or #13887
Closed
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage Issues and PRs relating to data and metadata storage

Comments

@gbbafna
Copy link
Collaborator

gbbafna commented Oct 19, 2023

Describe the bug
https://build.ci.opensearch.org/job/gradle-check/28391/

So many runs in gradle are failing due to same.

@gbbafna gbbafna added bug Something isn't working untriaged labels Oct 19, 2023
@gbbafna gbbafna reopened this Oct 19, 2023
@gbbafna
Copy link
Collaborator Author

gbbafna commented Oct 19, 2023

We have just muted this . Will need to fix this test as it is critical for repository stats to work

@gbbafna gbbafna added the Storage Issues and PRs relating to data and metadata storage label Oct 19, 2023
@mch2
Copy link
Member

mch2 commented Oct 19, 2023

FYI have a reproducible seed here with #10730

@akolarkunnu
Copy link
Contributor

akolarkunnu commented May 21, 2024

I analyzed this bug in detail and got the solution.
@rramachand21 Can you please assign this bug to me, if you are not working on this.

@peternied
Copy link
Member

@akolarkunnu Thanks for jumping on this issue!

@akolarkunnu
Copy link
Contributor

akolarkunnu commented May 22, 2024

Main part of error log is:
java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([F562E26F95091BFA:EAE65676A0DA30C5]:0)
at org.opensearch.repositories.s3.S3BlobStore.extendedStats(S3BlobStore.java:243)
at org.opensearch.repositories.blobstore.BlobStoreRepository.stats(BlobStoreRepository.java:859)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:558)
at org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testRequestStats(S3BlobStoreRepositoryTests.java:210)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue.

@akolarkunnu
Copy link
Contributor

akolarkunnu commented May 22, 2024

As a solution, we can set a proper GenericStatsMetricPublisher object from test.

@rramachand21 rramachand21 removed their assignment May 23, 2024
@rramachand21
Copy link
Member

@akolarkunnu thanks for your contribution on this. Let me know if you are blocked on any of the processes up until your fix is merged and I can help follow up.

akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this issue May 24, 2024
…Stats fails with NullPointerException

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.
Resolves opensearch-project#10735
Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this issue May 24, 2024
…Stats fails with NullPointerException

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.

Resolves opensearch-project#10735

Signed-off-by: akolarkunnu <[email protected]>
jed326 pushed a commit that referenced this issue May 25, 2024
…Stats fails with NullPointerException (#13814)

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.

Resolves #10735

Signed-off-by: akolarkunnu <[email protected]>
Co-authored-by: akolarkunnu <[email protected]>
@reta
Copy link
Collaborator

reta commented May 28, 2024

The issue is not fixed:

java.lang.AssertionError
	at __randomizedtesting.SeedInfo.seed([1B7179E203DFF64:1E33A38715EED45B]:0)
	at org.opensearch.repositories.RepositoryStats.merge(RepositoryStats.java:88)
	at java.base/java.util.stream.ReduceOps$2ReducingSink.accept(ReduceOps.java:123)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:662)
	at org.opensearch.repositories.s3.S3BlobStoreRepositoryTests.testRequestStats(S3BlobStoreRepositoryTests.java:209)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)

@reta reta reopened this May 28, 2024
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this issue May 29, 2024
…Stats fails with NullPointerException

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.

Resolves opensearch-project#10735

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
Signed-off-by: akolarkunnu <[email protected]>
@peternied peternied removed good first issue Good for newcomers untriaged labels May 29, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6
@reta Thanks for reopening this issue

@prudhvigodithi
Copy link
Contributor

prudhvigodithi commented May 29, 2024

Coming from the metrics dashboard, I can see the S3BlobStoreRepositoryTests as top hitter and failed on multiple PR's.

Screenshot 2024-05-29 at 10 58 19 AM

@msfroh
Copy link
Collaborator

msfroh commented May 30, 2024

I added an extra check that seems to fix a reliably reproducible seed on my machine: #13887

msfroh pushed a commit that referenced this issue May 30, 2024
…Stats fails with NullPointerException (#13866)

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.

Resolves #10735

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
Signed-off-by: akolarkunnu <[email protected]>
Co-authored-by: akolarkunnu <[email protected]>
parv0201 pushed a commit to parv0201/OpenSearch that referenced this issue Jun 10, 2024
…Stats fails with NullPointerException (opensearch-project#13814)

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.

Resolves opensearch-project#10735

Signed-off-by: akolarkunnu <[email protected]>
Co-authored-by: akolarkunnu <[email protected]>
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this issue Jul 24, 2024
…Stats fails with NullPointerException (opensearch-project#13866)

It's a NullPointerException from S3BlobStore.extendedStats() method, where 'genericStatsMetricPublisher' is null. This parameter sets through S3Repository constructor from test and in this test it sets as null. This is the root cause of the issue. If we set valid a GenericStatsMetricPublisher, test works fine without any issue. This was a consistent failure, not a random failure.

Resolves opensearch-project#10735

Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
Signed-off-by: akolarkunnu <[email protected]>
Co-authored-by: akolarkunnu <[email protected]>
Signed-off-by: kkewwei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage Issues and PRs relating to data and metadata storage
Projects
Status: ✅ Done
9 participants