Test every commit of web-platform-tests within 1 hour #164

foolip · 2017-10-18T11:45:53Z

A more aggressive goal than #108.

I have found myself recently making changes and waiting for them to show up on the dashboard, like in web-platform-tests/wpt#7758 (comment). Based on the fact that Chromium and Gecko runs most of the tests as part of their own CI and waterfall, it should be well within reach for the web-platform-tests dashboard to run all of the tests for every commit.

In the past 6 months, there have been on average ~9 commits per day. (Using --first-parent, because we would only test things that master has pointed to.)

This would require some kind of sharding to make it always run fast enough.

If we can get runs down to <50 minutes, it means the we could also use the same running infra for Travis CI, and deal with the worst case of having to run every test.

@mattl @lukebjerring @jgraham

The text was updated successfully, but these errors were encountered:

jgraham · 2017-10-18T11:55:09Z

This is totally achievable by sharding the runs across multiple instances. In gecko CI on Linux opt builds are arbitrarily sharded into 19 chunks (12 testharness, 6 reftests, 1 wdspec) and the longest run from a recent m-c commit took 37 minutes (most were under 25; unfortunately there isn't a good way to even out the timings if one directory happens to be particularly slow running). It's literally just about having enough machine resources. Gecko could probably run in 15 chunks without affecting the e2e time too much because reftests are fast; so you are looking at ~60 machines to do Firefox + Chrome + Safari + Edge in this time limit. Having said that I have no idea how this would interact with Sauce; I don't know if they would like us using 30 simultaneous connections, and that's presumably slower than local runs anyway.

I don't really understand the travis comment. What's the intent there?

foolip · 2017-10-18T13:20:10Z

I have no idea how this would interact with Sauce

Me neither, and I'm not taking for granted that we can keep using Sauce, or keep using the same account. If we need to maintain our own infrastructure to achieve fast-enough runs, then that's probably what we'll do. Just using many connections would be the first thing to investigate though.

I don't really understand the travis comment. What's the intent there?

I think it will look increasingly silly that we have two setups for running tests, which may be subtly different. More importantly, in web-platform-tests/wpt#7073, web-platform-tests/wpt#7475 and web-platform-tests/wpt#7660, what we have is mostly a capacity problem. If we had a way to do full runs in <50 minutes, then Travis could use that.

For web-platform-tests/wpt#7475 specifically, if we had very fast results for each commit, then we could possibly use those instead of running the tests without the changes. But that's a bit more speculative.

jgraham · 2017-10-18T13:54:40Z

So for Travis we don't want to run all the tests for each PR, but we want to run each modified test in a way that exposes stability issues. So on one hand I agree that having a way for travis to delegate that work to a larger pool of machines under the wpt.fyi banner would make sense, I'm not sure that it's precisely related to this issue because the main blocker would be a way of doing that delegation rather than a fast e2e time for the full run.

For web-platform-tests/wpt#7475 I think that using the day-old wpt-fyi results is better than adding extra travis load on each push since it will only make a difference in edge cases (where tests are changing rapidly).

jeffcarp · 2017-10-18T19:42:28Z

Suggestion: we could set up a sharded Travis CI or Circle CI run as a non-blocking builder for all WPT PRs.

Travis CI has a build matrix limit of 200. So hypothetically if we needed 60 machines as @jgraham said we could fit under that limit. I'd lean toward Circle CI though so we have the option of expanding to more browsers & builds. Circle CI also natively supports sharding.*

Looking at how some other large OSS projects with big test suites on GitHub solve this issue, here's a sorta random probably biased sample:

React uses CircleCI: https://circleci.com/gh/facebook/react
Angular uses CircleCI: https://circleci.com/gh/angular/angular
TensorFlow runs a Jenkins instance: https://ci.tensorflow.org/

Slightly related: imho I think it's a prerequisite for all of these options that we containerize the builds. I've worked on this in #153 and have already moved the Firefox cron job over to using the container. One benefit of containerization once we start sharding is that shard startup time should be super quick since the shards won't have to generate the manifest from scratch or clone the whole repos.

*It looks for parallelism Circle CI exposes the env vars CIRCLE_NODE_INDEX and CIRCLE_NODE_TOTAL, which we could feed directly into wpt run --this-chunk $CIRCLE_NODE_INDEX --total-chunks $CIRCLE_NODE_TOTAL.

jgraham · 2017-10-18T20:09:42Z

On travis, at least, there's a limit to how many concurrent machines we get. Setting up 60 instances wouldn't help if only one job ran at a time. I don't know what the situtation with other providers is, but it seems unlikely anyone is going to give us that kind of resources without a special arrangement likely involving money. I think the three options are probably:

Run on some big cloud provider with an agreement to get the level of resources we want.
Run on some browser-vendor-supplied infrastructure (e.g. Taskcluster) where we have the necessary contacts to get the level of resources we need.
Run our own hardware.

The last is particularly unappealing ;)

Independent of that, containering the builds seems reasonable, but whether it cuts down on setup time is at least a little unclear and depends on how caching works. Taskcluster uses docker for everything on Linux, but whenever a new instance is provisioned there's a noticable setup time to download the image. And then the VCS checkout happens inside the container, it's not a static part off the container. So I'm not sure exactly what you are imagining, but it's not entirely clear that e.g. generating a new container per run is viable (and I have other plans to make the manifest in particular faster to generate by downloading a cached copy).

jeffcarp · 2017-11-01T17:11:28Z

Update: I've been doing some thinking along these lines and as an intermediary solution and a step up from what we currently have, I set up a Jenkins cluster on GKE: https://ci.wpt.fyi. I migrated Edge yesterday (see currently running build) and hope to migrate Safari and FF soon.

This solves the following problems:

Running logs are public (example). This solves Setup papertrail for logs with a number of people who can have access #145.
People can view which tasks are currently running. This solves Test run status #86.
The version of wptdashboard baked into the test runner image is publicly inspectable (and could be automatically updated). This fixes Show which revision everything related to wpt-dashboard is running #144.
Jenkins manages log rotation for us. This fixes Build in periodic log purging #117.
We can use Jenkins to enforce that Sauce builds happen serially. I think this is the biggest cause of Edge & Safari falling behind today (i.e. the Edge run takes >24h, a new Edge run is started, and then neither can finish since the Sauce concurrency quota is exhausted).
We can shard test runs using --this-chunk and --total-chunks. I have a demo of this that runs /html in around 35-40 minutes using 4 shards. There is still work to be done in order to accomplish this though (specifically, writing a post-run merge script).

foolip · 2017-11-04T03:11:59Z

web-platform-tests/wpt#8063 is a good example of why we need full runs to be fast enough to be done in Travis. Currently the only way I have to be confident in such a change is to run it through Chromium's bots, and it was rather a lot of work. And it still wouldn't catch if the change broke everything in Safari, for example.

foolip · 2017-11-04T03:13:59Z

@jeffcarp, that Jenkins work looks amazing. Would it also help with sharding Sauce or BrowserStack in the end? @mattl, you and Jeff should be talking, a lot :)

jgraham · 2017-11-04T14:05:03Z

I'm not sure that is a good example of why we need full runs in travis, vs an example of a missing rule in the logic that detects relevant changes to test for a build. We certainly aren't going to be able to stability check with full runs, and for most things running every test is massive overkill.

I agree that the ability to request full runs for PRs where we think the changes are substantial would be a good improvement.

foolip · 2017-11-04T14:09:57Z

Filed a bug for that. But as we keep improving those rules, more and more PRs will (correctly) run so many tests that they'll time out and fail. I don't know if the sum of IDL tests are past that threshold.

foolip · 2018-04-17T10:28:05Z

Closing this, just like #108. wpt.fyi results are available within 1 hour is still our objective, but doesn't make sense to track as a monolithic goal here.

(In this repo we should track cycle time, and 30 minutes would be required to consistently have latency below 1 hour. 1 hour cycle time should give a mean latency of 90 minutes.)

foolip added enhancement test-running labels Oct 18, 2017

foolip mentioned this issue Oct 18, 2017

Surface test regressions in PRs web-platform-tests/wpt#7475

Closed

mattl self-assigned this Oct 25, 2017

jeffcarp mentioned this issue Nov 1, 2017

Dockerize WPT runs & add Jenkins k8s specs #153

Merged

foolip mentioned this issue Nov 4, 2017

Run all idlharness tests when webidl2.js changes web-platform-tests/wpt#8067

Closed

This was referenced Nov 4, 2017

Add ability to request full run of a wpt PR; show regressions #197

Closed

Support mobile browsers #219

Open

foolip mentioned this issue Nov 22, 2017

Announce commit or commits to test for internal and external runs #292

Closed

jugglinmike added the project:runner label Apr 5, 2018

jugglinmike unassigned mattl Apr 9, 2018

foolip mentioned this issue Apr 17, 2018

Health/SLA checks web-platform-tests/wpt.fyi#63

Closed

foolip closed this as completed Apr 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test every commit of web-platform-tests within 1 hour #164

Test every commit of web-platform-tests within 1 hour #164

foolip commented Oct 18, 2017

jgraham commented Oct 18, 2017

foolip commented Oct 18, 2017

jgraham commented Oct 18, 2017

jeffcarp commented Oct 18, 2017

jgraham commented Oct 18, 2017

jeffcarp commented Nov 1, 2017

foolip commented Nov 4, 2017

foolip commented Nov 4, 2017

jgraham commented Nov 4, 2017

foolip commented Nov 4, 2017

foolip commented Apr 17, 2018

Test every commit of web-platform-tests within 1 hour #164

Test every commit of web-platform-tests within 1 hour #164

Comments

foolip commented Oct 18, 2017

jgraham commented Oct 18, 2017

foolip commented Oct 18, 2017

jgraham commented Oct 18, 2017

jeffcarp commented Oct 18, 2017

jgraham commented Oct 18, 2017

jeffcarp commented Nov 1, 2017

foolip commented Nov 4, 2017

foolip commented Nov 4, 2017

jgraham commented Nov 4, 2017

foolip commented Nov 4, 2017

foolip commented Apr 17, 2018