-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing flaky tests #3098
Fixing flaky tests #3098
Conversation
WalkthroughThe recent updates focus on adjusting timing and test behavior across different components. A test case in Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (3)
- testng-core/src/test/java/test/listeners/issue2916/ElaborateSampleTestCase.java (1 hunks)
- testng-core/src/test/java/test/thread/issue188/Issue188TestSample.java (1 hunks)
- testng-test-osgi/src/test/java/org/testng/test/osgi/PlainOsgiTest.java (1 hunks)
Additional comments: 3
testng-core/src/test/java/test/listeners/issue2916/ElaborateSampleTestCase.java (1)
- 26-29: The modifications to the
timingOutTest
method, including changing the sleep duration to 10 seconds and addingAssert.fail()
, are understood in the context of addressing flaky tests. However, it would be beneficial to add a comment explaining the purpose of theAssert.fail()
call, especially for future maintainers or developers who might be examining this test to understand its intended behavior.testng-core/src/test/java/test/thread/issue188/Issue188TestSample.java (1)
- 40-40: Increasing the sleep duration in the
sleepSilently
method introduces variability and may not be the most effective way to address test flakiness related to timing. Consider exploring more deterministic approaches, such as using mock objects or simulating conditions, to achieve consistent test outcomes without relying on actual sleep durations.testng-test-osgi/src/test/java/org/testng/test/osgi/PlainOsgiTest.java (1)
- 28-30: Disabling the
versionShouldStartWithDigit
test with a comment indicating it should be re-enabled after specific conditions are met is a clear and understandable approach. To ensure this test is not forgotten, consider adding a tracking mechanism, such as a TODO with a specific issue number or creating a task in the project's issue tracker, linked to the conditions mentioned in the comment.
testng-core/src/test/java/test/listeners/issue2916/ElaborateSampleTestCase.java
Show resolved
Hide resolved
@@ -37,7 +37,7 @@ public void anotherSampleTest() { | |||
|
|||
private void sleepSilently() { | |||
try { | |||
TimeUnit.MILLISECONDS.sleep(500 * random.nextInt(10)); | |||
TimeUnit.MILLISECONDS.sleep(5000 * random.nextInt(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the sleep was "0..5 seconds", and now the timeout is "0..25 seconds". What is the reason for raising the upper bound only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just increasing the sleep value so that the sleep happens. I noticed that this test also fails sporadically when the stress parameters are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I don't understand what is fixed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien - Neither do I to be honest. These failures are very hard for me to reproduce. I am hoping that a bump up in the timeout value may perhaps fix them when running in LCM/GCM mode. Until theres a reliable way to reproduce the issue, I could ONLY do trial and error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incase you have any ideas on how to reproduce the problem please let me know. Alternatively we can merge this PR and see if it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just increasing the sleep value so that the sleep happens
@krmahadevan , nextInt
can return 0
, and in those cases the sleep
would be 0
.
If you want to increase the sleep here, you should ensure the code does not generate 0
sleeps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Julien - Neither do I to be honest. These failures are very hard for me to reproduce. I am hoping that a bump up in the timeout value may perhaps fix them when running in LCM/GCM mode. Until theres a reliable way to reproduce the issue, I could ONLY do trial and error.
Why not having both test cases? 2x more probabilities to produce the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juherr I am not sure how that is going to help in reproducing the problem. Having a smaller thread sleep is perhaps not giving enough time for the system to behave as intended when those flags are being passed. I am not sure as to what exactly is getting changed when those flags are being passed. From a functional stand point the code seems to work as re-iterated by the passing test. But under the conditions enforced by LCM/GCM it becomes flaky. Rather than merely throwing more tests and hoping that the issue would show up, I would like to be able to reproduce the problem so that it can be fixed.
@vlsi - Ok. Let me try to fix that by altering the test such that we atleast wait for 5 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
942605c
to
26a57a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (4)
- testng-core/src/test/java/test/listeners/issue2916/ElaborateSampleTestCase.java (1 hunks)
- testng-core/src/test/java/test/thread/issue188/Issue188TestSample.java (1 hunks)
- testng-core/src/test/java/test/thread/issue188/IssueTest.java (1 hunks)
- testng-test-osgi/src/test/java/org/testng/test/osgi/PlainOsgiTest.java (1 hunks)
Files skipped from review as they are similar to previous changes (3)
- testng-core/src/test/java/test/listeners/issue2916/ElaborateSampleTestCase.java
- testng-core/src/test/java/test/thread/issue188/Issue188TestSample.java
- testng-test-osgi/src/test/java/org/testng/test/osgi/PlainOsgiTest.java
Additional comments: 2
testng-core/src/test/java/test/thread/issue188/IssueTest.java (2)
- 43-43: The introduction of the
permissibleLag
variable to control the lag threshold is a good practice as it makes the test more adaptable to different execution environments. However, it's important to document the rationale behind choosing the value of40 ms
forpermissibleLag
. Understanding the basis for this value can help maintainers adjust it in the future if the test environment changes.- 43-56: The logic to check the difference in timestamps to ensure test methods start within the permissible lag is correctly implemented. However, consider adding a comment explaining the significance of this check, especially for future maintainers or contributors who might not be familiar with the context of this flakiness fix. Explaining why a certain lag is acceptable and how it relates to the test's parallel execution could provide valuable insights.
Summary by CodeRabbit
ElaborateSampleTestCase
to improve test reliability.Issue188TestSample
to enhance test coverage.versionShouldStartWithDigit
test in preparation for future updates.