Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7660 Avoid CI stability checks timing out #32202

Merged

Conversation

DanielRyanSmith
Copy link
Contributor

@DanielRyanSmith DanielRyanSmith commented Dec 27, 2021

addresses issue #7660

This is implemented based on the proposed solution described by @stephenmcgruer.

Changes:

  • Adds a new --repeat-max-time flag that will set how many minutes wptrunner should attempt to run the iterations of the test suite for the stability checks. Defaults to 100 minutes.
  • When it is anticipated that an additional iteration run will eclipse this max time, the iterations of the test suite will terminate early.
  • The test results will be evaluated based on the number of iterations run rather than the number attempted (10 by default). **As long as at least 2 or more iterations were completed.
  • wptrunner.run_tests has had some refactoring to trim down the size of the function to be more readable.

@DanielRyanSmith DanielRyanSmith marked this pull request as ready for review December 29, 2021 01:03
@wpt-pr-bot wpt-pr-bot added CSS2 infra wg-css wptrunner The automated test runner, commonly called through ./wpt run labels Dec 29, 2021
@DanielRyanSmith DanielRyanSmith requested review from stephenmcgruer and removed request for fantasai, kojiishi and svgeesus December 29, 2021 01:06
Copy link
Contributor

@stephenmcgruer stephenmcgruer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally LGTM as an implementation of my old design. If ok, I would rather leave it to current WPT maintainers to do a 'real' review of the code, as they will be the ones maintaining it long term.

Side-note: there's a bunch of changes here that seem to be the result of some autoformatting happening in your editor. It would be great to revert those changes, as (a) they're inconsistent with WPT styling elsewhere I think (though we don't have a real style guide afaik...) but more importantly (b) they make it much harder to spot which deltas are new and which are just reformats.

tools/wptrunner/wptrunner/stability.py Outdated Show resolved Hide resolved
# Use the number of repeated test suites that were run
# to process the results if the runs were stopped to
# avoid hitting the maximum run time.
if kwargs["avoided_timeout"]["did_avoid"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using kwargs as an output mechanism from wptrunner.run_tests is clever, but potentially confusing. I leave it to other reviewers (who are still involved with WPT and will have to maintain this ;)) to decide if they want to suggest a different method for communicating this information back to stability.py from the runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was hesitant about taking this approach. I know we definitely want to have access to the number of actual iterations that were run, but returning that information from wptrunner's run_tests implementation in a circumstance that it is only used when called by stability tests didn't feel right either. I am totally open to changing this implementation if anyone has suggestions or thinks a rework of run_tests is worth the change.

tools/wptrunner/wptrunner/stability.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/stability.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments:

  • If you're refactoring code it's really helpful to have that in a seperate commit first and then the actual changes in a commits, such that the commits are reviewable one-at-a-time. Of course that doesn't always work out, but in this case I think it could have done.
  • It looks like some kind of automated code formatting was applied. In general we've relied on pyflakes and not adopted automatic formatting (we couldn't enforce it anyway, since the project is used across multiple repos, and asking them to all use a specific version of a specific formatter is implausible). Also we're not aiming for 80 column lines; breaking before 100 is preferred.

In terms of the actual change; I think it's mostly good, but @stephenmcgruer is right (as usual) that using the kwargs as in in/out parameter feels like a bad idea. Even though we do a reasonable amount of in-place manipulation of the kwargs before passing them to run_tests, they aren't currently used for output. I think it will work out OK to update the function signature to return the extra data we need, and just drop it in start so we maintain compatibility with tools using that entry point.

Thanks for working on this!

tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved
tools/wptrunner/wptrunner/wptrunner.py Outdated Show resolved Hide resolved

return unexpected_total == 0
return evaluate_runs(counts, iteration_timeout, kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make run_tests return counts directly. Then start can discard the extra information and we can avoid putting the output in a kwarg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looking at this possible solution, it's a relatively simple fix that can be easily implemented and will not affect other tests using wptrunner. I made a change to only return counts["repeat"] to only give back the number of iterations of the test suite that were run, rather than all of counts. Do you think this is a good idea? Or would it be functionally better to return all of counts in the chance that other uses might arise and need further access to the other counts?

Now that wptrunner's run_tests returns more than 1 value, the return type will be a tuple for the older variables that expect only 1 value. We need to ensure that we pull the expected first value (boolean) out of that tuple.
wptrunner's run_tests would return a tuple only if not issues arose while running, and would return only a boolean in the case of some expected issue. Now a tuple is returned in all instances as expected.
@DanielRyanSmith
Copy link
Contributor Author

Thank you @stephenmcgruer and @jgraham for the reviews - I tried to remove some of the reformatting and I'll be sure to avoid any unnecessary auto-formatting changes and keep code refactoring in separate commits from now on.
Also, I would like to say that I appreciate how thorough the critiques of the code are, as it's helping me learn how to approach these issues moving forward!

Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty reasonable now, but I think at least returning the whole of counts (or even a new Status class like a static version of counts) as the second argument is preferable to just returning the number of iterations, since it gives us a way to return richer data in the future without making breaking changes to the API.

run_tests will now return a new TestStatus object rather than returning only the number of iterations run. This will allow for more robust statistics to be shown in the future.
reducing some comments and logs to taking less vertical space.
TestStatus docstring
@DanielRyanSmith
Copy link
Contributor Author

@jgraham, I tried to implement as you suggested. Normally I would create a dataclass for this, but since I we need to support Python 3.6 and dataclasses were added in 3.7, I just created a class with attributes to return. Maybe there is a better way to handle this?

class TestStatus:
"""Class that stores information on the results of test runs for later reference"""
def __init__(self, counts):
self.total_tests = counts["total_tests"]
self.skipped = counts["skipped"]
self.unexpected = counts["unexpected"]
self.unexpected_pass = counts["unexpected_pass"]
self.repeated_runs = counts["repeat"]
self.expected_repeated_runs = counts["expected_repeat"]

What would @jgraham do in this scenario? (I imagine I will be asking this often, if only to myself sometimes 😄 )

@jgraham
Copy link
Contributor

jgraham commented Jan 11, 2022

What would @jgraham do in this scenario?

I apologise in advance for any kind distress caused by thinking along those lines ;)

In this case I wonder if we could just have the TestStatus class and remove counts entirely, since we're not using the ability to add arbitary keys for anything other than convenience. So we'd change e.g. counts["skipped"] += 1 to status.skipped += 1. I agree that using a dataclass will be nice once we can depend on 3.7+.

forego the use of defaultdict counts keeping track of test info/results and instead use the custom class TestStatus.
@jgraham jgraham closed this Jan 11, 2022
@jgraham jgraham reopened this Jan 11, 2022
Copy link
Contributor

@jgraham jgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this looks good now. If the CI passes feel free to squash and merge with an appropriate commit message. If the decision task fails again we'll need to investigate what's going on; it seems unlikely to be this specific changeset.

@DanielRyanSmith DanielRyanSmith merged commit 0a269ce into web-platform-tests:master Jan 11, 2022
@DanielRyanSmith DanielRyanSmith deleted the 7660-stability-timeout branch January 11, 2022 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra wptrunner The automated test runner, commonly called through ./wpt run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants