Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruthlessly mark tests that fail frequently as flaky #43955

Closed
mcollina opened this issue Jul 23, 2022 · 12 comments
Closed

Ruthlessly mark tests that fail frequently as flaky #43955

mcollina opened this issue Jul 23, 2022 · 12 comments

Comments

@mcollina
Copy link
Member

The status of our CI is deteriorating to the point of being impossible to land anything.

Here is my proposal: declare bankruptcy and move all tests detected in https://github.com/nodejs/reliability as flaky.

@mcollina mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jul 23, 2022
@mcollina
Copy link
Member Author

cc @nodejs/tsc

@tniessen
Copy link
Member

Context: #43754 (comment) and #43929 (comment)

@lpinca
Copy link
Member

lpinca commented Jul 23, 2022

To be honest I don't think this is a good idea. More time should be spent on fixing flaky tests before adding new features. If this is not done the number of flaky tests will only grow over time defeating the whole point of testing.

@devsnek
Copy link
Member

devsnek commented Jul 23, 2022

Do we have any context on whether the CI run was intended to pass? Sometimes I will be running on battery, so instead of building/testing/linting on my laptop I will let the CI do it, which generally comes up red.

@lpinca I agree with you in principle but that hinges on getting actual humans to spend actual time fixing the tests. Do we have the assurance that this can be done?

@lpinca
Copy link
Member

lpinca commented Jul 23, 2022

Do we have the assurance that this can be done?

I don't know but we should at least try. I only remember a handful of contributors who took the time to do this. I also think that we should help new contributors not to create flaky tests. For example, discouraging the use of timers unless strictly needed.

@tniessen
Copy link
Member

More time should be spent on fixing flaky tests before adding new features. If this is not done the number of flaky tests will only grow over time defeating the whole point of testing.

@lpinca I agree with you. However, as you said these tests are flaky, so marking them as flaky seems like the right approach. If it is not then we should use different terms for tests that are flaky and tests that should be marked as flaky.

@lpinca
Copy link
Member

lpinca commented Jul 24, 2022

However, as you said these tests are flaky

Sometimes flakiness is caused by real bugs. If we simply mark tests flaky with no investigation, we might hide real issues and make it harder to discover and fix bugs. Here are two recent examples:

and the likely underlying bug

@tniessen
Copy link
Member

Sometimes flakiness is caused by real bugs.

@lpinca I am not disagreeing, I'd even go as far as to say that almost all flaky tests are caused by bugs -- either in node, in the test itself, or in the test environment.

If we simply mark tests flaky with no investigation, we might hide real issues and make it harder to discover and fix bugs.

That's not the goal; we should treat the list of flaky tests as an urgent TODO list. My point is: flaky tests should not make contributing a worse experience for everyone.

@mcollina
Copy link
Member Author

Note that close to 50% of our failures are of the obscure nature, either Jenkins or build-related issues.

@bnoordhuis
Copy link
Member

Looking at the latest reliability report, nearly half are infrastructural failures; marking things flaky won't help there.

Some of the real flakes, like parallel/test-heapsnapshot-near-heap-limit-worker, are probably impossible to make reliable except in a statistical sense ("on average didn't fail more than 1 out of 10 times during the last 100 runs.")

Tests like that we could either disable/remove/mark flaky, or build out the CI to track swings in statistical flakiness. That's what e.g. Mozilla does for their test suites.

@mhdawson
Copy link
Member

I'd agree that for tests we cannot make reliable we should remove them.

In the bigger picture I don't think we should just automatically mark failed tests as flaky but I am ok with us making people feel more comfortable about doing that when tests are flaky and time is needed to investigate.

I think we should also try something like "flaky fix days/week" or something like that. We tried some of those within RH and I think we helped make some progress. If we did it as a project I think we should make it a "only fix flaky tests day/week" ie nothing else should be tested/landed during that period of time.

@tniessen tniessen changed the title Ruthlessly mark tests that fails frequently as flaky Ruthlessly mark tests that fail frequently as flaky Jul 27, 2022
@mcollina mcollina removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Aug 10, 2022
@mcollina
Copy link
Member Author

This can be closed, CI is mostly better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants