Skip to content
This repository has been archived by the owner on Nov 6, 2019. It is now read-only.

Test every commit of web-platform-tests within 1 hour #164

Closed
foolip opened this issue Oct 18, 2017 · 11 comments
Closed

Test every commit of web-platform-tests within 1 hour #164

foolip opened this issue Oct 18, 2017 · 11 comments

Comments

@foolip
Copy link
Member

foolip commented Oct 18, 2017

A more aggressive goal than #108.

I have found myself recently making changes and waiting for them to show up on the dashboard, like in web-platform-tests/wpt#7758 (comment). Based on the fact that Chromium and Gecko runs most of the tests as part of their own CI and waterfall, it should be well within reach for the web-platform-tests dashboard to run all of the tests for every commit.

In the past 6 months, there have been on average ~9 commits per day. (Using --first-parent, because we would only test things that master has pointed to.)

This would require some kind of sharding to make it always run fast enough.

If we can get runs down to <50 minutes, it means the we could also use the same running infra for Travis CI, and deal with the worst case of having to run every test.

@mattl @lukebjerring @jgraham

@jgraham
Copy link
Collaborator

jgraham commented Oct 18, 2017

This is totally achievable by sharding the runs across multiple instances. In gecko CI on Linux opt builds are arbitrarily sharded into 19 chunks (12 testharness, 6 reftests, 1 wdspec) and the longest run from a recent m-c commit took 37 minutes (most were under 25; unfortunately there isn't a good way to even out the timings if one directory happens to be particularly slow running). It's literally just about having enough machine resources. Gecko could probably run in 15 chunks without affecting the e2e time too much because reftests are fast; so you are looking at ~60 machines to do Firefox + Chrome + Safari + Edge in this time limit. Having said that I have no idea how this would interact with Sauce; I don't know if they would like us using 30 simultaneous connections, and that's presumably slower than local runs anyway.

I don't really understand the travis comment. What's the intent there?

@foolip
Copy link
Member Author

foolip commented Oct 18, 2017

I have no idea how this would interact with Sauce

Me neither, and I'm not taking for granted that we can keep using Sauce, or keep using the same account. If we need to maintain our own infrastructure to achieve fast-enough runs, then that's probably what we'll do. Just using many connections would be the first thing to investigate though.

I don't really understand the travis comment. What's the intent there?

I think it will look increasingly silly that we have two setups for running tests, which may be subtly different. More importantly, in web-platform-tests/wpt#7073, web-platform-tests/wpt#7475 and web-platform-tests/wpt#7660, what we have is mostly a capacity problem. If we had a way to do full runs in <50 minutes, then Travis could use that.

For web-platform-tests/wpt#7475 specifically, if we had very fast results for each commit, then we could possibly use those instead of running the tests without the changes. But that's a bit more speculative.

@jgraham
Copy link
Collaborator

jgraham commented Oct 18, 2017

So for Travis we don't want to run all the tests for each PR, but we want to run each modified test in a way that exposes stability issues. So on one hand I agree that having a way for travis to delegate that work to a larger pool of machines under the wpt.fyi banner would make sense, I'm not sure that it's precisely related to this issue because the main blocker would be a way of doing that delegation rather than a fast e2e time for the full run.

For web-platform-tests/wpt#7475 I think that using the day-old wpt-fyi results is better than adding extra travis load on each push since it will only make a difference in edge cases (where tests are changing rapidly).

@jeffcarp
Copy link
Contributor

Suggestion: we could set up a sharded Travis CI or Circle CI run as a non-blocking builder for all WPT PRs.

Travis CI has a build matrix limit of 200. So hypothetically if we needed 60 machines as @jgraham said we could fit under that limit. I'd lean toward Circle CI though so we have the option of expanding to more browsers & builds. Circle CI also natively supports sharding.*

Looking at how some other large OSS projects with big test suites on GitHub solve this issue, here's a sorta random probably biased sample:

Slightly related: imho I think it's a prerequisite for all of these options that we containerize the builds. I've worked on this in #153 and have already moved the Firefox cron job over to using the container. One benefit of containerization once we start sharding is that shard startup time should be super quick since the shards won't have to generate the manifest from scratch or clone the whole repos.

*It looks for parallelism Circle CI exposes the env vars CIRCLE_NODE_INDEX and CIRCLE_NODE_TOTAL, which we could feed directly into wpt run --this-chunk $CIRCLE_NODE_INDEX --total-chunks $CIRCLE_NODE_TOTAL.

@jgraham
Copy link
Collaborator

jgraham commented Oct 18, 2017

On travis, at least, there's a limit to how many concurrent machines we get. Setting up 60 instances wouldn't help if only one job ran at a time. I don't know what the situtation with other providers is, but it seems unlikely anyone is going to give us that kind of resources without a special arrangement likely involving money. I think the three options are probably:

  • Run on some big cloud provider with an agreement to get the level of resources we want.
  • Run on some browser-vendor-supplied infrastructure (e.g. Taskcluster) where we have the necessary contacts to get the level of resources we need.
  • Run our own hardware.

The last is particularly unappealing ;)

Independent of that, containering the builds seems reasonable, but whether it cuts down on setup time is at least a little unclear and depends on how caching works. Taskcluster uses docker for everything on Linux, but whenever a new instance is provisioned there's a noticable setup time to download the image. And then the VCS checkout happens inside the container, it's not a static part off the container. So I'm not sure exactly what you are imagining, but it's not entirely clear that e.g. generating a new container per run is viable (and I have other plans to make the manifest in particular faster to generate by downloading a cached copy).

@mattl mattl self-assigned this Oct 25, 2017
@jeffcarp
Copy link
Contributor

jeffcarp commented Nov 1, 2017

Update: I've been doing some thinking along these lines and as an intermediary solution and a step up from what we currently have, I set up a Jenkins cluster on GKE: https://ci.wpt.fyi. I migrated Edge yesterday (see currently running build) and hope to migrate Safari and FF soon.

This solves the following problems:

@foolip
Copy link
Member Author

foolip commented Nov 4, 2017

web-platform-tests/wpt#8063 is a good example of why we need full runs to be fast enough to be done in Travis. Currently the only way I have to be confident in such a change is to run it through Chromium's bots, and it was rather a lot of work. And it still wouldn't catch if the change broke everything in Safari, for example.

@foolip
Copy link
Member Author

foolip commented Nov 4, 2017

@jeffcarp, that Jenkins work looks amazing. Would it also help with sharding Sauce or BrowserStack in the end? @mattl, you and Jeff should be talking, a lot :)

@jgraham
Copy link
Collaborator

jgraham commented Nov 4, 2017

I'm not sure that is a good example of why we need full runs in travis, vs an example of a missing rule in the logic that detects relevant changes to test for a build. We certainly aren't going to be able to stability check with full runs, and for most things running every test is massive overkill.

I agree that the ability to request full runs for PRs where we think the changes are substantial would be a good improvement.

@foolip
Copy link
Member Author

foolip commented Nov 4, 2017

Filed a bug for that. But as we keep improving those rules, more and more PRs will (correctly) run so many tests that they'll time out and fail. I don't know if the sum of IDL tests are past that threshold.

@foolip
Copy link
Member Author

foolip commented Apr 17, 2018

Closing this, just like #108. wpt.fyi results are available within 1 hour is still our objective, but doesn't make sense to track as a monolithic goal here.

(In this repo we should track cycle time, and 30 minutes would be required to consistently have latency below 1 hour. 1 hour cycle time should give a mean latency of 90 minutes.)

@foolip foolip closed this as completed Apr 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants