Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wptrunner] Reject tests that almost time out in stability check #11570

Merged
merged 4 commits into from
Sep 5, 2018

Conversation

zcorpan
Copy link
Member

@zcorpan zcorpan commented Jun 19, 2018

Fixes #9972.

TODO: allow longer time for timeout=long tests.

@ricea
Copy link
Contributor

ricea commented Jun 19, 2018

Looks okay, but I am not a python expert.

Maybe turn it on in warning mode only until timeout=long is supported?

@zcorpan
Copy link
Member Author

zcorpan commented Jun 19, 2018

I was hoping @gsnedders or someone could help me figure out how to check for timeout=long

@foolip
Copy link
Member

foolip commented Jun 19, 2018

The numbers in #9972 (comment) were from a very beefy workstation, and we don't yet know what typical execution time in Travis is.

Can we start with a very high limit, like 25 seconds, that would probably catch no tests at all? In parallel, it would be great to start collecting the wptreport.json from all PRs, so that we can have a look at what the distribution of execution time is, and compare that to what the execution time in other infrastructures is.

@zcorpan
Copy link
Member Author

zcorpan commented Jun 19, 2018

Let's use 30s for now.

@zcorpan zcorpan changed the title [wptrunner] Reject tests that take >5s to run [wptrunner] Reject tests that take >30s to run Jun 19, 2018
@gsnedders
Copy link
Member

I was hoping @gsnedders or someone could help me figure out how to check for timeout=long

I'm not sure there's any easy way to get at that from here, because if I'm not mistaken we've not got as far as loading the manifest at the point of running the stability entry point.

@Hexcles
Copy link
Member

Hexcles commented Jun 19, 2018

We almost want to count the longest duration for tests with normal timeout and tests with long timeout separately (instead of using different threshold depending on what the slowest test is), because otherwise the longest running test is very likely one with long timeout and we miss those tests with normal timeout but perhaps closer to timeout.

As for the implementation, @gsnedders is right; it's very hard to consult the manifest here. I'm thinking perhaps the wptrunner can add the timeout of the test as an additional field to its test_end messages. This field should be ignored by the other log handlers (to avoid bloating the log or report), but can be used by the stability check.

@zcorpan
Copy link
Member Author

zcorpan commented Jun 20, 2018

I'm thinking perhaps the wptrunner can add the timeout of the test as an additional field to its test_end messages.

I've tried to poke around a bit, but can't figure out how to do this. Any pointers?

@Hexcles
Copy link
Member

Hexcles commented Jun 20, 2018

@zcorpan alright this needs some plumbing; sit tight.

  1. self.logger.test_end(test.id,
    status,
    message=file_result.message,
    expected=expected,
    extra=file_result.extra)

    Here you can pass test.timeout as a field in extra (i.e. merge it with file_result.extra). FWIW, test is an instance of wpttest.Test.
  2. Then in the mozlog formatter for stability check, you can find the information in data["extra"] in the test_end handler.

Caveats:

  1. This doesn't consider timeout_multiplier. You'll probably find how to find this with some grep.
  2. The bigger caveat is that the step 1 might be a bit hacky. Not sure how @gsnedders & @jgraham think about it, but a cleaner way might be to go step deeper into executors to add the timeout information to file_result from there:
    result = self.do_test(test)

@zcorpan
Copy link
Member Author

zcorpan commented Jun 20, 2018

Thanks, I think I got it working (but doesn't do anything with timeout_multiplier unless that's already baked in).

Copy link
Member

@Hexcles Hexcles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I'm giving my r+, but please also wait for opinions from others.

@zcorpan
Copy link
Member Author

zcorpan commented Jun 21, 2018

@Hexcles can you merge this if there has been no review by June 29? Also, if there are review comments, can you address them, please?

@zcorpan zcorpan changed the title [wptrunner] Reject tests that take >30s to run [wptrunner] Reject tests that take >8s to run (or >48s for timeout=long) Jun 21, 2018
@zcorpan zcorpan mentioned this pull request Jun 21, 2018
@gsnedders
Copy link
Member

I'd like to see what @jgraham has to say, but he's on holiday till after then IIRC.

@zcorpan
Copy link
Member Author

zcorpan commented Jun 21, 2018

Ok to wait for James.

@Hexcles
Copy link
Member

Hexcles commented Jun 21, 2018

@zcorpan alright, so let's wait for more reviews. If it's something easy to fix, I'll help you address the comments and land the PR when you're away. Otherwise, I'll wait till you're back.

Meanwhile, could you take a look at the high-level discussions in #9972 first? (Sorry, you already noticed.)

@jgraham
Copy link
Contributor

jgraham commented Jun 22, 2018

I haven't reviewed this because I'm away, but as noted on the original issue, I'm pretty skeptical of failing travis because of long-running tests. In particular it's hostile to tests for features that aren't implemented in any browser we run in CI which are more likely to time out.

I think we should discuss some mechanism to warn for this kind of thing rather than hard-failing.

@ricea
Copy link
Contributor

ricea commented Jun 22, 2018

Indeed, "Failing by timeout" is sometimes the most natural way to write a test (for example, a fetch abort test). I was assuming that reviewers would be able to apply their judgement in these cases.

@zcorpan
Copy link
Member Author

zcorpan commented Jun 22, 2018

Hmm I didn't intend do fail the check for TIMEOUT.

@Hexcles
Copy link
Member

Hexcles commented Jun 22, 2018

So to clarify (based on my understanding):

  • Tests that consistently time out won't fail the check.
  • Tests that time out sometimes will fail the check (which is already the case).
  • (NEW) Tests that never timeout (during the 10 reruns) but are very close to timeout will now fail as they are likely to be flaky.

And I think this is a reasonable approach to prevent flakiness with minimal cost of productivity.

@jgraham
Copy link
Contributor

jgraham commented Jun 23, 2018

Oh, if that's how this works then I agree that the approach makes sense :) Sorry for not twigging that the patch wasn't treating "timeout" and "nearly time out" the same (I still haven't reviewed the code, but based on the discussion I would be more inclined to log the actual test timeout, including any timeout multipler, on test_start than test_end, since that seems like a more natural place to look for static properties of the test.

if not file_result.extra:
file_result.extra = { "test_timeout": test.timeout }
else:
file_result.extra.test_timeout = test.timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? I would expect it to be file_result.extra["test_timeout"] unless extra isn't a dictionary for some reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. @Hexcles ?

@wpt-pr-bot wpt-pr-bot added the wptrunner The automated test runner, commonly called through ./wpt run label Jul 31, 2018
@Hexcles
Copy link
Member

Hexcles commented Aug 29, 2018

@jgraham @gsnedders I've pushed a few new commits to this PR. PTAL again.

The PR now calculates the longest duration per test, and checks if it exceeds 80% of the timeout for each test individually (and timeout multiplier is taken into account), instead of only checking the longest duration of all tests, as tests can have different timeouts.

Besides, consistent timeout is now allowed.

@jgraham I didn't move the logic into test_start as the doc for mozlog says test_start isn't supposed to take extra parameters.

@zcorpan my apologies for the long delay; I was busy fixing some high-priority issues on wpt.fyi. Apparently it's been over a month and you've already come back; time does fly... Sigh.

@Hexcles
Copy link
Member

Hexcles commented Aug 29, 2018

To test:

  • ./wpt run --verify firefox infrastructure/expected-fail/timeout.html should NOT fail.
  • ./wpt run --verify firefox infrastructure/expected-fail/slow.html should fail with output like this:
0:31.51 INFO ## Slow tests ##
0:31.51 INFO |                    Test                    | Longest duration (ms) | Timeout (ms) |
0:31.51 INFO |--------------------------------------------|-----------------------|--------------|
0:31.51 INFO | `/infrastructure/expected-flaky/slow.html` | `9074`                | `10000`      |

@Hexcles Hexcles force-pushed the zcorpan/fail-stability-checker-on-too-long-duration branch from 69fd109 to 35cd861 Compare August 29, 2018 21:26
def is_slow(test):
# We are interested in tests that almost time out (i.e. likely to be
# flaky). Tests that consistently time out are not considered "slow".
if test["status"].keys() == ['TIMEOUT']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CRASH should also be excluded because we can change the status from TIMEOUT to CRASH if we detect evidence of a crash after the test finished. In general I wonder if testing

Also, it seems arguable that the handling of a test that sometimes times out and sometimes doesn't is wrong; if we get PASS, PASS, PASS, TIMEOUT, PASS then we'll call the test slow. But if the passes take 1s and the timeout is 10s then we are probably seeing a race condition that causes the test to hang sometimes, and the obvious solution to a slow test (Add a longer timeout) isn't going to help. So I think we should either store the duration per result and filter here, or only update the duration when we see a non-TIMEOUT, non-CRASH result. The latter is easier, but the former could be used to provide more detailed logging if we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about CRASH.

As for the second example (inconsistent results), it won't actually happen as we call is_inconsistent before is_slow, so such tests will be considered as flaky instead of slow. But I do see the code can be confusing and error-prone as the prerequisite isn't obvious. I'll tweak it a bit.

@Hexcles Hexcles force-pushed the zcorpan/fail-stability-checker-on-too-long-duration branch from c427051 to 18f3d3d Compare August 30, 2018 21:24
@wpt-pr-bot wpt-pr-bot added the ci label Aug 30, 2018
@Hexcles Hexcles changed the title [wptrunner] Reject tests that take >8s to run (or >48s for timeout=long) [wptrunner] Reject tests that almost time out in stability check Aug 30, 2018
@Hexcles
Copy link
Member

Hexcles commented Aug 30, 2018

@jgraham I think I addressed your comment this time. PTAL at find_slow_status (the new version of is_slow) in particular.

And you can see stability checks rejecting the example slow test in action:
https://travis-ci.org/web-platform-tests/wpt/jobs/422774773#L764
https://travis-ci.org/web-platform-tests/wpt/jobs/422774775#L686

tools/wptrunner/wptrunner/stability.py Outdated Show resolved Hide resolved
threshold = test["timeout"] * FLAKY_THRESHOLD
for status in ['PASS', 'FAIL', 'OK', 'ERROR', 'INTERNAL-ERROR']:
if (status in test["longest_duration"] and
test["longest_duration"][status] > threshold):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird indentation (yeah I know pyflakes likes it, but I think it's a bad rule which we don't enforce).

zcorpan and others added 4 commits September 5, 2018 11:46
In stability check (wpt run --verify), reject tests that almost time
out, i.e. take more than 80% of the timeout allowed to run). These tests
will be listed in a new section, "slow tests", in the output.

Fixes #9972.
Previously we record the longest duration of all tests. However, each
test can have different timeout. Besides, test runs with different
results should be considered differently; runs that time out should be
ignored.

Therefore, we now keep track of the longest duration for each result
status of each test.

Also add a unit test.
@Hexcles Hexcles force-pushed the zcorpan/fail-stability-checker-on-too-long-duration branch from 17ac857 to 8503633 Compare September 5, 2018 15:46
@Hexcles Hexcles merged commit 69717e2 into master Sep 5, 2018
@zcorpan zcorpan deleted the zcorpan/fail-stability-checker-on-too-long-duration branch September 6, 2018 07:13
Hexcles added a commit that referenced this pull request Sep 6, 2018
In #11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.
Ms2ger pushed a commit that referenced this pull request Sep 7, 2018
In #11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Sep 12, 2018
…ed, a=testonly

Automatic update from web-platform-testsFix stability check when tests are skipped

In web-platform-tests/wpt#11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.

--

wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9
wpt-pr: 12884
jankeromnes pushed a commit to jankeromnes/gecko that referenced this pull request Sep 12, 2018
…ed, a=testonly

Automatic update from web-platform-testsFix stability check when tests are skipped

In web-platform-tests/wpt#11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.

--

wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9
wpt-pr: 12884
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Oct 3, 2019
…ed, a=testonly

Automatic update from web-platform-testsFix stability check when tests are skipped

In web-platform-tests/wpt#11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.

--

wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9
wpt-pr: 12884

UltraBlame original commit: 06905fbaafb424ef27d009c39d89411901d62a06
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Oct 3, 2019
…ed, a=testonly

Automatic update from web-platform-testsFix stability check when tests are skipped

In web-platform-tests/wpt#11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.

--

wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9
wpt-pr: 12884

UltraBlame original commit: 06905fbaafb424ef27d009c39d89411901d62a06
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Oct 3, 2019
…ed, a=testonly

Automatic update from web-platform-testsFix stability check when tests are skipped

In web-platform-tests/wpt#11570, we added a
check to reject slow tests in stability checks. The code didn't consider
an edge case where tests are skipped and don't contain "extra" info.

This change silently skips a test in this check if it does not have
necessary extra information.

--

wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9
wpt-pr: 12884

UltraBlame original commit: 06905fbaafb424ef27d009c39d89411901d62a06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci infra wptrunner The automated test runner, commonly called through ./wpt run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants