7660 Avoid CI stability checks timing out #32202

DanielRyanSmith · 2021-12-27T13:26:32Z

addresses issue #7660

This is implemented based on the proposed solution described by @stephenmcgruer.

Changes:

Adds a new --repeat-max-time flag that will set how many minutes wptrunner should attempt to run the iterations of the test suite for the stability checks. Defaults to 100 minutes.
When it is anticipated that an additional iteration run will eclipse this max time, the iterations of the test suite will terminate early.
The test results will be evaluated based on the number of iterations run rather than the number attempted (10 by default). **As long as at least 2 or more iterations were completed.
wptrunner.run_tests has had some refactoring to trim down the size of the function to be more readable.

…en repeat runs

If the repeat runs stop early to avoid, timeout, a message will be written in the results explaining so.

css/CSS2/borders/border-001.xht

stephenmcgruer

This generally LGTM as an implementation of my old design. If ok, I would rather leave it to current WPT maintainers to do a 'real' review of the code, as they will be the ones maintaining it long term.

Side-note: there's a bunch of changes here that seem to be the result of some autoformatting happening in your editor. It would be great to revert those changes, as (a) they're inconsistent with WPT styling elsewhere I think (though we don't have a real style guide afaik...) but more importantly (b) they make it much harder to spot which deltas are new and which are just reformats.

tools/wptrunner/wptrunner/stability.py

stephenmcgruer · 2022-01-04T22:50:05Z

tools/wptrunner/wptrunner/stability.py

+    # Use the number of repeated test suites that were run
+    # to process the results if the runs were stopped to
+    # avoid hitting the maximum run time.
+    if kwargs["avoided_timeout"]["did_avoid"]:


Using kwargs as an output mechanism from wptrunner.run_tests is clever, but potentially confusing. I leave it to other reviewers (who are still involved with WPT and will have to maintain this ;)) to decide if they want to suggest a different method for communicating this information back to stability.py from the runner.

Yes, I was hesitant about taking this approach. I know we definitely want to have access to the number of actual iterations that were run, but returning that information from wptrunner's run_tests implementation in a circumstance that it is only used when called by stability tests didn't feel right either. I am totally open to changing this implementation if anyone has suggestions or thinks a rework of run_tests is worth the change.

tools/wptrunner/wptrunner/stability.py

tools/wptrunner/wptrunner/wptrunner.py

jgraham

Some general comments:

If you're refactoring code it's really helpful to have that in a seperate commit first and then the actual changes in a commits, such that the commits are reviewable one-at-a-time. Of course that doesn't always work out, but in this case I think it could have done.
It looks like some kind of automated code formatting was applied. In general we've relied on pyflakes and not adopted automatic formatting (we couldn't enforce it anyway, since the project is used across multiple repos, and asking them to all use a specific version of a specific formatter is implausible). Also we're not aiming for 80 column lines; breaking before 100 is preferred.

In terms of the actual change; I think it's mostly good, but @stephenmcgruer is right (as usual) that using the kwargs as in in/out parameter feels like a bad idea. Even though we do a reasonable amount of in-place manipulation of the kwargs before passing them to run_tests, they aren't currently used for output. I think it will work out OK to update the function signature to return the extra data we need, and just drop it in start so we maintain compatibility with tools using that entry point.

Thanks for working on this!

tools/wptrunner/wptrunner/wptrunner.py

jgraham · 2022-01-07T16:46:31Z

tools/wptrunner/wptrunner/wptrunner.py


-    return unexpected_total == 0
+    return evaluate_runs(counts, iteration_timeout, kwargs)


I think we should make run_tests return counts directly. Then start can discard the extra information and we can avoid putting the output in a kwarg.

Yes, looking at this possible solution, it's a relatively simple fix that can be easily implemented and will not affect other tests using wptrunner. I made a change to only return counts["repeat"] to only give back the number of iterations of the test suite that were run, rather than all of counts. Do you think this is a good idea? Or would it be functionally better to return all of counts in the chance that other uses might arise and need further access to the other counts?

Now that wptrunner's run_tests returns more than 1 value, the return type will be a tuple for the older variables that expect only 1 value. We need to ensure that we pull the expected first value (boolean) out of that tuple.

wptrunner's run_tests would return a tuple only if not issues arose while running, and would return only a boolean in the case of some expected issue. Now a tuple is returned in all instances as expected.

DanielRyanSmith · 2022-01-09T01:53:54Z

Thank you @stephenmcgruer and @jgraham for the reviews - I tried to remove some of the reformatting and I'll be sure to avoid any unnecessary auto-formatting changes and keep code refactoring in separate commits from now on.
Also, I would like to say that I appreciate how thorough the critiques of the code are, as it's helping me learn how to approach these issues moving forward!

jgraham

This is looking pretty reasonable now, but I think at least returning the whole of counts (or even a new Status class like a static version of counts) as the second argument is preferable to just returning the number of iterations, since it gives us a way to return richer data in the future without making breaking changes to the API.

run_tests will now return a new TestStatus object rather than returning only the number of iterations run. This will allow for more robust statistics to be shown in the future.

reducing some comments and logs to taking less vertical space.

TestStatus docstring

DanielRyanSmith · 2022-01-10T19:10:24Z

@jgraham, I tried to implement as you suggested. Normally I would create a dataclass for this, but since I we need to support Python 3.6 and dataclasses were added in 3.7, I just created a class with attributes to return. Maybe there is a better way to handle this?

wpt/tools/wptrunner/wptrunner/wptrunner.py

Lines 273 to 281 in 9d27581

    
           class TestStatus: 
        
               """Class that stores information on the results of test runs for later reference""" 
        
               def __init__(self, counts): 
        
                   self.total_tests = counts["total_tests"] 
        
                   self.skipped = counts["skipped"] 
        
                   self.unexpected = counts["unexpected"] 
        
                   self.unexpected_pass = counts["unexpected_pass"] 
        
                   self.repeated_runs = counts["repeat"] 
        
                   self.expected_repeated_runs = counts["expected_repeat"]

What would @jgraham do in this scenario? (I imagine I will be asking this often, if only to myself sometimes 😄 )

jgraham · 2022-01-11T13:26:32Z

What would @jgraham do in this scenario?

I apologise in advance for any kind distress caused by thinking along those lines ;)

In this case I wonder if we could just have the TestStatus class and remove counts entirely, since we're not using the ability to add arbitary keys for anything other than convenience. So we'd change e.g. counts["skipped"] += 1 to status.skipped += 1. I agree that using a dataclass will be nice once we can depend on 3.7+.

forego the use of defaultdict counts keeping track of test info/results and instead use the custom class TestStatus.

jgraham

Thanks! I think this looks good now. If the CI passes feel free to squash and merge with an appropriate commit message. If the decision task fails again we'll need to investigate what's going on; it seems unlikely to be this specific changeset.

DanielRyanSmith added 11 commits December 27, 2021 05:24

allow stability checks to avoid TC timeout by checking times in betwe…

3058ead

…en repeat runs

fix flake8 issue

52e31bb

remove empty flag to trigger stability checks

33f8b1b

some commenting explaining how iterations are being tracked

624e341

add --repeat-max-time flag

11b8753

better descriptors for announcing results

c28794e

If the repeat runs stop early to avoid, timeout, a message will be written in the results explaining so.

cast max_time to timedelta

e0b33e6

correct syntax for kwargs

d357344

specify kwargs source for run_test_iteration

02e4f87

remove empty css flags tag to trigger stability checks

36db2c5

Add clause to handle an ineffective number of iterations completing

3f9bddb

DanielRyanSmith commented Dec 29, 2021

View reviewed changes

css/CSS2/borders/border-001.xht Outdated Show resolved Hide resolved

DanielRyanSmith marked this pull request as ready for review December 29, 2021 01:03

wpt-pr-bot added CSS2 infra wg-css wptrunner The automated test runner, commonly called through ./wpt run labels Dec 29, 2021

wpt-pr-bot assigned DanielRyanSmith Dec 29, 2021

wpt-pr-bot requested review from dbaron, fantasai, foolip, frivoal, jgraham, kojiishi and svgeesus December 29, 2021 01:04

DanielRyanSmith removed wg-css CSS2 labels Dec 29, 2021

replace unassociated change used to trigger stability checks

b3425cf

DanielRyanSmith removed request for frivoal and dbaron December 29, 2021 01:06

DanielRyanSmith requested review from stephenmcgruer and removed request for fantasai, kojiishi and svgeesus December 29, 2021 01:06

stephenmcgruer reviewed Jan 4, 2022

View reviewed changes

DanielRyanSmith added 3 commits January 4, 2022 17:18

Implement changes suggested by @stephenmcgruer

c93f895

Add only necessary formatting changes to stability

4d7e6f2

Merge branch 'master' into 7660-stability-timeout

2252d91

jgraham requested changes Jan 7, 2022

View reviewed changes

DanielRyanSmith added 6 commits January 8, 2022 14:07

wptrunner reformatting changes suggested by @jgraham

2350b76

functional changes to stability tests suggested by @jgraham

01a2f07

flake8 changes "line break before binary operator"

2d9fe27

change run_tests to return counts object

330124a

ensure run_tests return values are properly accessed

899b794

Now that wptrunner's run_tests returns more than 1 value, the return type will be a tuple for the older variables that expect only 1 value. We need to ensure that we pull the expected first value (boolean) out of that tuple.

run_tests has consistent return values even in fail cases

42d9f5e

wptrunner's run_tests would return a tuple only if not issues arose while running, and would return only a boolean in the case of some expected issue. Now a tuple is returned in all instances as expected.

jgraham reviewed Jan 10, 2022

View reviewed changes

DanielRyanSmith added 3 commits January 10, 2022 10:26

Return full counts in TestStatus class

a13f488

run_tests will now return a new TestStatus object rather than returning only the number of iterations run. This will allow for more robust statistics to be shown in the future.

small formatting changes

e9d90c8

reducing some comments and logs to taking less vertical space.

small wording change

9d27581

TestStatus docstring

DanielRyanSmith added 2 commits January 11, 2022 09:43

Replace counts with TestStatus object

9d11203

forego the use of defaultdict counts keeping track of test info/results and instead use the custom class TestStatus.

convert some log strings to f-strings

6d00729

jgraham closed this Jan 11, 2022

jgraham reopened this Jan 11, 2022

jgraham approved these changes Jan 11, 2022

View reviewed changes

DanielRyanSmith merged commit 0a269ce into web-platform-tests:master Jan 11, 2022

DanielRyanSmith deleted the 7660-stability-timeout branch January 11, 2022 23:26

DanielRyanSmith mentioned this pull request Feb 7, 2022

When many tests are affected, CI stability jobs will time out #7660

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7660 Avoid CI stability checks timing out #32202

7660 Avoid CI stability checks timing out #32202

DanielRyanSmith commented Dec 27, 2021 •

edited

Loading

stephenmcgruer left a comment

stephenmcgruer Jan 4, 2022

DanielRyanSmith Jan 5, 2022

jgraham left a comment

jgraham Jan 7, 2022

DanielRyanSmith Jan 8, 2022

DanielRyanSmith commented Jan 9, 2022

jgraham left a comment

DanielRyanSmith commented Jan 10, 2022

jgraham commented Jan 11, 2022

jgraham left a comment


		return unexpected_total == 0
		return evaluate_runs(counts, iteration_timeout, kwargs)

7660 Avoid CI stability checks timing out #32202

7660 Avoid CI stability checks timing out #32202

Conversation

DanielRyanSmith commented Dec 27, 2021 • edited Loading

stephenmcgruer left a comment

Choose a reason for hiding this comment

stephenmcgruer Jan 4, 2022

Choose a reason for hiding this comment

DanielRyanSmith Jan 5, 2022

Choose a reason for hiding this comment

jgraham left a comment

Choose a reason for hiding this comment

jgraham Jan 7, 2022

Choose a reason for hiding this comment

DanielRyanSmith Jan 8, 2022

Choose a reason for hiding this comment

DanielRyanSmith commented Jan 9, 2022

jgraham left a comment

Choose a reason for hiding this comment

DanielRyanSmith commented Jan 10, 2022

jgraham commented Jan 11, 2022

jgraham left a comment

Choose a reason for hiding this comment

DanielRyanSmith commented Dec 27, 2021 •

edited

Loading