[wptrunner] Reject tests that almost time out in stability check #11570

zcorpan · 2018-06-19T13:01:45Z

Fixes #9972.

~~TODO: allow longer time for timeout=long tests.~~

ricea · 2018-06-19T13:25:11Z

Looks okay, but I am not a python expert.

Maybe turn it on in warning mode only until timeout=long is supported?

zcorpan · 2018-06-19T13:55:28Z

I was hoping @gsnedders or someone could help me figure out how to check for timeout=long

foolip · 2018-06-19T14:27:16Z

The numbers in #9972 (comment) were from a very beefy workstation, and we don't yet know what typical execution time in Travis is.

Can we start with a very high limit, like 25 seconds, that would probably catch no tests at all? In parallel, it would be great to start collecting the wptreport.json from all PRs, so that we can have a look at what the distribution of execution time is, and compare that to what the execution time in other infrastructures is.

zcorpan · 2018-06-19T14:28:30Z

Let's use 30s for now.

gsnedders · 2018-06-19T15:29:10Z

I was hoping @gsnedders or someone could help me figure out how to check for timeout=long

I'm not sure there's any easy way to get at that from here, because if I'm not mistaken we've not got as far as loading the manifest at the point of running the stability entry point.

Hexcles · 2018-06-19T23:42:07Z

We almost want to count the longest duration for tests with normal timeout and tests with long timeout separately (instead of using different threshold depending on what the slowest test is), because otherwise the longest running test is very likely one with long timeout and we miss those tests with normal timeout but perhaps closer to timeout.

As for the implementation, @gsnedders is right; it's very hard to consult the manifest here. I'm thinking perhaps the wptrunner can add the timeout of the test as an additional field to its test_end messages. This field should be ignored by the other log handlers (to avoid bloating the log or report), but can be used by the stability check.

zcorpan · 2018-06-20T19:34:33Z

I'm thinking perhaps the wptrunner can add the timeout of the test as an additional field to its test_end messages.

I've tried to poke around a bit, but can't figure out how to do this. Any pointers?

Hexcles · 2018-06-20T20:26:26Z

@zcorpan alright this needs some plumbing; sit tight.

wpt/tools/wptrunner/wptrunner/testrunner.py

Lines 585 to 589 in 4321fff

    
           self.logger.test_end(test.id, 
        
                                status, 
        
                                message=file_result.message, 
        
                                expected=expected, 
        
                                extra=file_result.extra)

Here you can pass test.timeout as a field in extra (i.e. merge it with file_result.extra). FWIW, test is an instance of wpttest.Test.

Then in the mozlog formatter for stability check, you can find the information in data["extra"] in the test_end handler.

Caveats:

This doesn't consider timeout_multiplier. You'll probably find how to find this with some grep.
The bigger caveat is that the step 1 might be a bit hacky. Not sure how @gsnedders & @jgraham think about it, but a cleaner way might be to go step deeper into executors to add the timeout information to file_result from there:

wpt/tools/wptrunner/wptrunner/executors/base.py

Line 158 in 4321fff

result = self.do_test(test)

zcorpan · 2018-06-20T21:02:43Z

Thanks, I think I got it working (but doesn't do anything with timeout_multiplier unless that's already baked in).

Hexcles

Cool! I'm giving my r+, but please also wait for opinions from others.

zcorpan · 2018-06-21T08:55:26Z

@Hexcles can you merge this if there has been no review by June 29? Also, if there are review comments, can you address them, please?

gsnedders · 2018-06-21T17:29:18Z

I'd like to see what @jgraham has to say, but he's on holiday till after then IIRC.

zcorpan · 2018-06-21T18:00:58Z

Ok to wait for James.

Hexcles · 2018-06-21T18:03:07Z

@zcorpan alright, so let's wait for more reviews. If it's something easy to fix, I'll help you address the comments and land the PR when you're away. Otherwise, I'll wait till you're back.

~~Meanwhile, could you take a look at the high-level discussions in #9972 first?~~ (Sorry, you already noticed.)

jgraham · 2018-06-22T05:39:26Z

I haven't reviewed this because I'm away, but as noted on the original issue, I'm pretty skeptical of failing travis because of long-running tests. In particular it's hostile to tests for features that aren't implemented in any browser we run in CI which are more likely to time out.

I think we should discuss some mechanism to warn for this kind of thing rather than hard-failing.

ricea · 2018-06-22T05:43:15Z

Indeed, "Failing by timeout" is sometimes the most natural way to write a test (for example, a fetch abort test). I was assuming that reviewers would be able to apply their judgement in these cases.

zcorpan · 2018-06-22T06:36:54Z

Hmm I didn't intend do fail the check for TIMEOUT.

Hexcles · 2018-06-22T13:54:15Z

So to clarify (based on my understanding):

Tests that consistently time out won't fail the check.
Tests that time out sometimes will fail the check (which is already the case).
(NEW) Tests that never timeout (during the 10 reruns) but are very close to timeout will now fail as they are likely to be flaky.

And I think this is a reasonable approach to prevent flakiness with minimal cost of productivity.

jgraham · 2018-06-23T00:00:23Z

Oh, if that's how this works then I agree that the approach makes sense :) Sorry for not twigging that the patch wasn't treating "timeout" and "nearly time out" the same (I still haven't reviewed the code, but based on the discussion I would be more inclined to log the actual test timeout, including any timeout multipler, on test_start than test_end, since that seems like a more natural place to look for static properties of the test.

jgraham · 2018-07-31T14:08:03Z

tools/wptrunner/wptrunner/testrunner.py

+        if not file_result.extra:
+            file_result.extra = { "test_timeout": test.timeout }
+        else:
+            file_result.extra.test_timeout = test.timeout


Does this work? I would expect it to be file_result.extra["test_timeout"] unless extra isn't a dictionary for some reason.

You're probably right. @Hexcles ?

Hexcles · 2018-08-29T21:20:39Z

@jgraham @gsnedders I've pushed a few new commits to this PR. PTAL again.

The PR now calculates the longest duration per test, and checks if it exceeds 80% of the timeout for each test individually (and timeout multiplier is taken into account), instead of only checking the longest duration of all tests, as tests can have different timeouts.

Besides, consistent timeout is now allowed.

@jgraham I didn't move the logic into test_start as the doc for mozlog says test_start isn't supposed to take extra parameters.

@zcorpan my apologies for the long delay; I was busy fixing some high-priority issues on wpt.fyi. Apparently it's been over a month and you've already come back; time does fly... Sigh.

Hexcles · 2018-08-29T21:22:32Z

To test:

./wpt run --verify firefox infrastructure/expected-fail/timeout.html should NOT fail.
./wpt run --verify firefox infrastructure/expected-fail/slow.html should fail with output like this:

0:31.51 INFO ## Slow tests ##
0:31.51 INFO |                    Test                    | Longest duration (ms) | Timeout (ms) |
0:31.51 INFO |--------------------------------------------|-----------------------|--------------|
0:31.51 INFO | `/infrastructure/expected-flaky/slow.html` | `9074`                | `10000`      |

jgraham · 2018-08-30T10:33:14Z

tools/wptrunner/wptrunner/stability.py

+def is_slow(test):
+    # We are interested in tests that almost time out (i.e. likely to be
+    # flaky). Tests that consistently time out are not considered "slow".
+    if test["status"].keys() == ['TIMEOUT']:


I think CRASH should also be excluded because we can change the status from TIMEOUT to CRASH if we detect evidence of a crash after the test finished. In general I wonder if testing

Also, it seems arguable that the handling of a test that sometimes times out and sometimes doesn't is wrong; if we get PASS, PASS, PASS, TIMEOUT, PASS then we'll call the test slow. But if the passes take 1s and the timeout is 10s then we are probably seeing a race condition that causes the test to hang sometimes, and the obvious solution to a slow test (Add a longer timeout) isn't going to help. So I think we should either store the duration per result and filter here, or only update the duration when we see a non-TIMEOUT, non-CRASH result. The latter is easier, but the former could be used to provide more detailed logging if we want.

You're right about CRASH.

As for the second example (inconsistent results), it won't actually happen as we call is_inconsistent before is_slow, so such tests will be considered as flaky instead of slow. But I do see the code can be confusing and error-prone as the prerequisite isn't obvious. I'll tweak it a bit.

Hexcles · 2018-08-30T21:47:49Z

@jgraham I think I addressed your comment this time. PTAL at find_slow_status (the new version of is_slow) in particular.

And you can see stability checks rejecting the example slow test in action:
https://travis-ci.org/web-platform-tests/wpt/jobs/422774773#L764
https://travis-ci.org/web-platform-tests/wpt/jobs/422774775#L686

tools/wptrunner/wptrunner/stability.py

jgraham · 2018-09-04T14:51:56Z

tools/wptrunner/wptrunner/stability.py

+    threshold = test["timeout"] * FLAKY_THRESHOLD
+    for status in ['PASS', 'FAIL', 'OK', 'ERROR', 'INTERNAL-ERROR']:
+        if (status in test["longest_duration"] and
+                test["longest_duration"][status] > threshold):


Weird indentation (yeah I know pyflakes likes it, but I think it's a bad rule which we don't enforce).

In stability check (wpt run --verify), reject tests that almost time out, i.e. take more than 80% of the timeout allowed to run). These tests will be listed in a new section, "slow tests", in the output. Fixes #9972.

Previously we record the longest duration of all tests. However, each test can have different timeout. Besides, test runs with different results should be considered differently; runs that time out should be ignored. Therefore, we now keep track of the longest duration for each result status of each test. Also add a unit test.

In #11570, we added a check to reject slow tests in stability checks. The code didn't consider an edge case where tests are skipped and don't contain "extra" info. This change silently skips a test in this check if it does not have necessary extra information.

…ed, a=testonly Automatic update from web-platform-testsFix stability check when tests are skipped In web-platform-tests/wpt#11570, we added a check to reject slow tests in stability checks. The code didn't consider an edge case where tests are skipped and don't contain "extra" info. This change silently skips a test in this check if it does not have necessary extra information. -- wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9 wpt-pr: 12884

…ed, a=testonly Automatic update from web-platform-testsFix stability check when tests are skipped In web-platform-tests/wpt#11570, we added a check to reject slow tests in stability checks. The code didn't consider an edge case where tests are skipped and don't contain "extra" info. This change silently skips a test in this check if it does not have necessary extra information. -- wpt-commits: eb145fc244f183d7c3aab7a93560330e666159e9 wpt-pr: 12884 UltraBlame original commit: 06905fbaafb424ef27d009c39d89411901d62a06

zcorpan requested review from foolip, Hexcles and ricea June 19, 2018 13:01

wpt-pr-bot added the infra label Jun 19, 2018

wpt-pr-bot requested review from gsnedders and jgraham June 19, 2018 13:01

zcorpan changed the title ~~[wptrunner] Reject tests that take >5s to run~~ [wptrunner] Reject tests that take >30s to run Jun 19, 2018

Hexcles approved these changes Jun 20, 2018

View reviewed changes

zcorpan assigned gsnedders Jun 20, 2018

zcorpan changed the title ~~[wptrunner] Reject tests that take >30s to run~~ [wptrunner] Reject tests that take >8s to run (or >48s for timeout=long) Jun 21, 2018

zcorpan mentioned this pull request Jun 21, 2018

Fix slow tests #11571

Open

zcorpan assigned jgraham Jun 21, 2018

jgraham requested changes Jul 31, 2018

View reviewed changes

wpt-pr-bot added the wptrunner The automated test runner, commonly called through ./wpt run label Jul 31, 2018

wpt-pr-bot requested review from jugglinmike and kereliuk August 29, 2018 21:13

Hexcles force-pushed the zcorpan/fail-stability-checker-on-too-long-duration branch from 69fd109 to 35cd861 Compare August 29, 2018 21:26

jgraham requested changes Aug 30, 2018

View reviewed changes

Hexcles force-pushed the zcorpan/fail-stability-checker-on-too-long-duration branch from c427051 to 18f3d3d Compare August 30, 2018 21:24

wpt-pr-bot added the ci label Aug 30, 2018

Hexcles changed the title ~~[wptrunner] Reject tests that take >8s to run (or >48s for timeout=long)~~ [wptrunner] Reject tests that almost time out in stability check Aug 30, 2018

jgraham reviewed Sep 4, 2018

View reviewed changes

jgraham approved these changes Sep 4, 2018

View reviewed changes

zcorpan and others added 4 commits September 5, 2018 11:46

[wptrunner] Reject tests that almost time out

6542d12

In stability check (wpt run --verify), reject tests that almost time out, i.e. take more than 80% of the timeout allowed to run). These tests will be listed in a new section, "slow tests", in the output. Fixes #9972.

Take timeout_multiplier into account

54bf52c

Fix ci/check_stability.py

8503633

Hexcles force-pushed the zcorpan/fail-stability-checker-on-too-long-duration branch from 17ac857 to 8503633 Compare September 5, 2018 15:46

Hexcles merged commit 69717e2 into master Sep 5, 2018

zcorpan deleted the zcorpan/fail-stability-checker-on-too-long-duration branch September 6, 2018 07:13

Hexcles mentioned this pull request Sep 6, 2018

Fix stability check when tests are skipped #12884

Merged

foolip mentioned this pull request Sep 24, 2018

Also ignore ERROR in slow test check #13158

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wptrunner] Reject tests that almost time out in stability check #11570

[wptrunner] Reject tests that almost time out in stability check #11570

zcorpan commented Jun 19, 2018 •

edited by Hexcles

Loading

ricea commented Jun 19, 2018

zcorpan commented Jun 19, 2018

foolip commented Jun 19, 2018

zcorpan commented Jun 19, 2018

gsnedders commented Jun 19, 2018

Hexcles commented Jun 19, 2018

zcorpan commented Jun 20, 2018

Hexcles commented Jun 20, 2018

zcorpan commented Jun 20, 2018

Hexcles left a comment

zcorpan commented Jun 21, 2018

gsnedders commented Jun 21, 2018

zcorpan commented Jun 21, 2018

Hexcles commented Jun 21, 2018 •

edited

Loading

jgraham commented Jun 22, 2018

ricea commented Jun 22, 2018

zcorpan commented Jun 22, 2018

Hexcles commented Jun 22, 2018

jgraham commented Jun 23, 2018

jgraham Jul 31, 2018

zcorpan Aug 22, 2018

Hexcles commented Aug 29, 2018

Hexcles commented Aug 29, 2018 •

edited

Loading

jgraham Aug 30, 2018

Hexcles Aug 30, 2018

Hexcles commented Aug 30, 2018

jgraham Sep 4, 2018

[wptrunner] Reject tests that almost time out in stability check #11570

[wptrunner] Reject tests that almost time out in stability check #11570

Conversation

zcorpan commented Jun 19, 2018 • edited by Hexcles Loading

ricea commented Jun 19, 2018

zcorpan commented Jun 19, 2018

foolip commented Jun 19, 2018

zcorpan commented Jun 19, 2018

gsnedders commented Jun 19, 2018

Hexcles commented Jun 19, 2018

zcorpan commented Jun 20, 2018

Hexcles commented Jun 20, 2018

zcorpan commented Jun 20, 2018

Hexcles left a comment

Choose a reason for hiding this comment

zcorpan commented Jun 21, 2018

gsnedders commented Jun 21, 2018

zcorpan commented Jun 21, 2018

Hexcles commented Jun 21, 2018 • edited Loading

jgraham commented Jun 22, 2018

ricea commented Jun 22, 2018

zcorpan commented Jun 22, 2018

Hexcles commented Jun 22, 2018

jgraham commented Jun 23, 2018

jgraham Jul 31, 2018

Choose a reason for hiding this comment

zcorpan Aug 22, 2018

Choose a reason for hiding this comment

Hexcles commented Aug 29, 2018

Hexcles commented Aug 29, 2018 • edited Loading

jgraham Aug 30, 2018

Choose a reason for hiding this comment

Hexcles Aug 30, 2018

Choose a reason for hiding this comment

Hexcles commented Aug 30, 2018

jgraham Sep 4, 2018

Choose a reason for hiding this comment

zcorpan commented Jun 19, 2018 •

edited by Hexcles

Loading

Hexcles commented Jun 21, 2018 •

edited

Loading

Hexcles commented Aug 29, 2018 •

edited

Loading