Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] flaky test index/80_geo_point/Single point test #4852

Closed
peternied opened this issue Oct 20, 2022 · 11 comments
Closed

[BUG] flaky test index/80_geo_point/Single point test #4852

peternied opened this issue Oct 20, 2022 · 11 comments
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run

Comments

@peternied
Copy link
Member

peternied commented Oct 20, 2022

Describe the bug
Looks like there is a flaky test case index/80_geo_point/Single point test inside of org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test

java.lang.AssertionError: Failure at [index/80_geo_point:60]: hits.total didn't match expected value:
hits.total: expected Integer [6] but was Integer [5]

To Reproduce
Steps to reproduce the behavior:

  1. REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v2.4.0#mixedClusterTest' --tests "org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=index/80_geo_point/Single point test}" -Dtests.seed=9E49FBE4E90B87AB -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-AR -Dtests.timezone=Asia/Tehran -Druntime.java=17

Expected behavior
There should be no flaky tests in the gradle check

Host/Environment (please complete the following information):

  • Jenkins build environment

Additional context

Please react with 🤔 if you see further instances and paste a link to the impacted pull request in this issue.

@peternied peternied added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run labels Oct 20, 2022
@ketanv3
Copy link
Contributor

ketanv3 commented Oct 21, 2022

Another PR impacted by this - #4805

@kotwanikunal
Copy link
Member

Test consistently failing for #4870

@heemin32
Copy link
Contributor

Test succeeded in #4805. #4870 will succeed once it is rebased.

@andrross
Copy link
Member

Fixed by #4860

@peternied
Copy link
Member Author

It looks like the test has been disabled - yay for not failing gradle checks! - but it has not been fixed. I think this issue should stay open until the test is fixed or deleted.

@peternied peternied reopened this Nov 11, 2022
@heemin32
Copy link
Contributor

The test is not disabled. The confusion is coming from the naming of the test. The test is newly added with new feature of geojson format support for point type in version 2.4. Therefore, it fails with previous version which does not support the geojson format for point type.

Let me separate the two cases out instead of combining it...

@peternied
Copy link
Member Author

@heemin32 Good detail!

I am curious if the test was running against versions of OpenSearch that did not support the feature, then I would expect gradle check to be blocked since the test was merged - do you know how the test presented itself as intermittently failing?

@andrross
Copy link
Member

@peternied My understanding is that in the mixed cluster test case the requests may go to any node. If the requests requiring the new feature happen to hit only the nodes with the newer version of OpenSearch, then the tests will pass. Hence in the case of a compatibility issue you might see intermittent successes and failures.

Let me separate the two cases out instead of combining it...

@heemin32 What action are you taking here?

@heemin32
Copy link
Contributor

@andrross Creating test case for testing all previous formats and creating another test case testing geojson format for only version 2.4 and above. Do you think we need that?

@peternied
Copy link
Member Author

I am going to close out this issue since its tracking the test failure itself, up to @heemin32 if you'd like to re-work the test case to be more obvious about its dependencies.

If the requests requiring the new feature happen to hit only the nodes with the newer version of OpenSearch, then the tests will pass.

This sounds like tests are inherently non-deterministic - seems like something we should try to fix as it might be the root cause of many other intermittent failures. @andrross Could you open an issue to track separately?

@andrross
Copy link
Member

@peternied Opened an issue here #5257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run
Projects
None yet
Development

No branches or pull requests

7 participants