Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WPT CLI Feature proposal: --experimental #10452

Open
jugglinmike opened this issue Apr 12, 2018 · 23 comments
Open

WPT CLI Feature proposal: --experimental #10452

jugglinmike opened this issue Apr 12, 2018 · 23 comments

Comments

@jugglinmike
Copy link
Contributor

We're reporting WPT results for major browsers via https://wpt.fyi, and we'd like those results to match the results that project contributors experience (via local testing and via CI) as much as possible. To that end, we're discussing how to promote parity between environments. A new flag for the wpt run command might be a good way to do that.

By way of example, executing the command ./wpt run --experimental chrome would cause the CLI to:

  1. Download the dev build of Chrome
    • Skipped in the presence of --binary
  2. Infer the browser version (via google-chrome --version)
  3. Download the appropriate build of ChromeDriver
    • according to an internally-managed Chrome-version-to-Chromedriver-version
      mapping
    • Skipped in the presence of --webdriver-binary
  4. Set all known-good experimental Chrome feature flags
    • according to an internally-managed Chrome-version-to-Chrome-flags mapping
    • merged with any options specified via --binary-arg
  5. Set all known-good experimental ChromeDriver feature flags (though I don't
    think there are any at the moment)
    • according to an internally-managed
      ChromeDriver-version-to-Chromedriver-flags mapping
    • merged with any options specified via --webdriver-arg
  6. Set any appropriate environment variables

All of those internally-managed mappings may seems like a new maintenance burden. The truth is, we've already had to address these issues. It's just that previously, we've simply deleted configuration from WPT as it became obsolete. With this implementation of the --experimental flag, we'd just be archiving that configuration rather than deleting it.

Regardless, since the WPT CLI already offers automatic installation of a WebDriver implementation, step 3 could be framed as a bug fix rather than a feature. I've opened gh-10451 to discuss that separately.

@jgraham
Copy link
Contributor

jgraham commented Apr 12, 2018

How does this work for Chrome developers and Chrome CI?

My initial reaction is that I'm reluctant to maintain these internal lists that will need constant maintainance (every time a new feature is added behind a flag). For unstable/dev channel in particular it seems problematic, but that's by far the most interesting version.

@foolip
Copy link
Member

foolip commented Apr 12, 2018

How does this work for Chrome developers and Chrome CI?

Where there are command line flags, they are maintained in the Chromium source code itself, for example:
https://chromium.googlesource.com/chromium/src/+/cf0c1ae7701e8892554f511da7dff02bf5fd6a84/third_party/WebKit/LayoutTests/VirtualTestSuites

Some of Chrome's CI infra isn't in the Chromium repo, of course, but I'd actually be surprised if those are the bits that directly invoke the chrome binary. (Aside: wpt is run using the content_shell binary, but it doesn't matter in this context.) @Hexcles, do you have a clearer picture?

@foolip
Copy link
Member

foolip commented Apr 12, 2018

For the proposal itself, I think we're going to eventually also care about upgrading stable versions of browsers in a more principled manner, and crucially we may want some overlap in the wpt revisions tested by the old and new version, so that they may be compared directly without "test material" noise.

It's a different use case from getting wpt.fyi and Travis on the same page, but the solution must also involve pinning versions of browsers and drivers someone, but can't be tracked in the main repo if we want overlap, at least not without an extra layer of... something.

So... something like a configuration file that can be provided to wpt run but is maintained separately, perhaps? There are already lots of ini files, maybe this one even works already:
https://github.com/w3c/web-platform-tests/blob/master/tools/wptrunner/wptrunner.default.ini

@jugglinmike if that works or could be made to work, how do you think such ini files would have to be maintained for this to work? And given a browser and driver version, where would wpt run find them?

@foolip
Copy link
Member

foolip commented Apr 12, 2018

Worth noting is that getting the command line arguments in sync could be done without solving the versioning problems.

@jgraham
Copy link
Contributor

jgraham commented Apr 12, 2018

For stable versions we shouldn't really be passing in any flags apart from those required to run tests. I appreciate that gecko isn't quite there yet since there are some feature prefs in the prefs_general.js file we apply; it could be worth thinking about how to solve that, but it seems low priority. In terms of drivers, I really think this data should be available externally or I don't see how web developers are supposed to cope.

So am I right in thinking that Chrome CI only invokes experimenal features on a per-test (directory) basis, and there isn't a general concept of "the set of experimental features that are suitable for use in the dev channel"?

@foolip
Copy link
Member

foolip commented Apr 12, 2018

@jgraham CLs (~PRs) are tested before landing and then there's the https://ci.chromium.org/p/chromium waterfall builders, but there's no testing that happens using binaries from the dev channel, AFAIK, those are already released and deemed to have passed testing.

So no, I don't think we maintain configurations or command line arguments anywhere outside of the chromium src repo itself, or at least I can't see why we would.

I imagine the situation is very similar for Firefox?

@foolip
Copy link
Member

foolip commented Apr 12, 2018

For stable versions we shouldn't really be passing in any flags apart from those required to run tests. I appreciate that gecko isn't quite there yet since there are some feature prefs in the prefs_general.js file we apply; it could be worth thinking about how to solve that, but it seems low priority.

prefs_general.js is quite substantial :) For wpt.fyi we'll continue to work around some especially pressing things to unblock Chrome teams, as in web-platform-tests/results-collection#125, where we have a fair idea of how we're going to undo it and are working on it. (@kereliuk plans to implement the permissions WebDriver API soon, that's one part of it.)

But this isn't related to stable versions vs. nightlies at all, really, as far as possible I think we'll want the configuration to be the same. The only exception we're deliberately going to make it to enable experimental web platform features for Chrome Dev and disable it for stable, so that people can see the results of things not yet shipped, and point to them in blink intent threads. There I don't think it makes sense to do the same on stable, because we want that to stay as close as possible to the world that web developers live in.

Do you think the same makes sense for Firefox stable vs. nightly? If not, tell @jugglinmike, because the work is happening now :)

@jugglinmike
Copy link
Contributor Author

My initial reaction is that I'm reluctant to maintain these internal lists
that will need constant maintainance (every time a new feature is added
behind a flag). For unstable/dev channel in particular it seems problematic,
but that's by far the most interesting version.

I think getting these details right is important for WPT. The wpt.fyi project has received a few special handling instructions from Chromium developers, e.g. web-platform-tests/results-collection#81 and web-platform-tests/results-collection#125. The Chromium team relies on their own infrastructure for validating WPT patches, so I suppose that's why they're satisfied with inaccurate results in the WPT CI and why they're only reporting problems with the more publicly-visible results.

However, this inconsistency tends to subvert WPT's stability checks. Tests that might be unstable in a properly-configured Chrome would present as stable (though failing) in a misconfigured instance. People contributing from other contexts (whether from another browser's source tree or directly to WPT itself) would also benefit from an accurate view of how their tests hold up across implementations.

By including the configuration within WPT, we're aligning the interests of folks who want accurate published results with folks who want to contribute strong interoperable tests. Because of this, we might even expect that the most of the maintenance would be done by the browser vendors themselves.

And the maintenance burden could be pretty light if we rely on pattern matching in the definition of this information. @foolip asked for an example, so here goes:

[chrome:arguments]
64..* --use-fake-ui-for-media-stream # (still relevant)
59..62 # (no longer used)

[firefox:preferences]
50..* media.navigator.streams.fake=true

There would be no urgency behind the introduction of new flags. With a syntax like this, the only potential blocker for WPT would be when a browser removes a flag. This would be easy to diagnose, and the fix would amount to pinning a * to the prior release. I really don't know how often that would be necessary, though.

For stable versions we shouldn't really be passing in any flags apart from
those required to run tests

There I don't think it makes sense to do the same on stable, because we want
that to stay as close as possible to the world that web developers live in.

Agree completely

Do you think the same makes sense for Firefox stable vs. nightly? If not,
tell @jugglinmike, because the work is happening now :)

James knows this well because he's been helping out :)

@jgraham
Copy link
Contributor

jgraham commented Apr 12, 2018

OK, I'm finding this pretty hard to follow. Let my try to summarise my understanding

  • For stable channel browsers we want to use the released configuration, plus prefs required to actually make the tests work (e.g. disabling popup blockers). For Chrome this is basically already true, for Firefox we are currently applying the prefs_general.js file which contains both kinds of change.
  • For unstable versions, we want to enable features that are still in development. How this works depends on the browser:
    • For Chrome there is no default set of flags, but the CI is configured to apply certain flags in certain directories using a JSON file. We currently apply none of these.
    • For Firefox there are a default set of nightly-only prefs, plus the prefs_general.js prefs, plus per directory prefs that, for wpt, we store in the expectation metadata files in-tree.
  • The configuration in wpt.fyi for nightly builds should match the configuration in browser's CI systems since this is most likely to be actively maintained by developers (true for both Gecko and Blink).
  • For Chrome there is an additional wrinkle that Chromedriver may break at random due to protocol changes.

If that understanding is correct, I am still failing to see why we want to maintain our own list of prefs/flags or why we would maintain version mappings. Ignoring the Chromedriver stuff, it seems like we really want to be pulling the nightly configuration options from the vendor repo, and eliminating configuration options where possible for stable releases. Any prefs that should be indiscriminantly set for all tests in stable releases (like popup blocking) can be hardcoded in the wptrunner files.

I guess there is a set of prefs that might change reasonably often but have to be applied to stable releases to run tests (e.g. the prefs to disable external network connections in Firefox). But I really really don't want to maintain those; if we are serious about e.g. getting rid of prefs_general.js for stable builds we should find some way to maintain that minimal set of prefs in the vendor repos and have vendor builds fail when the prefs are missing (e.g. I would probably add a gecko CI task to run the infrastructure tests with a prefs_minimal.js version of the gecko prefs, so that the people updating pref names are forced to maintain it for me).

@Hexcles
Copy link
Member

Hexcles commented Apr 12, 2018

I know how these flags are handled in the testing setup, but I'm not sure if it's a good idea to do the same thing in WPT. As @jgraham said, it's a very tedious and laborious task, relying on the owner of the flags (or features).

However, we can (and perhaps should) use the --enable-experimental-web-platform-features flag with Chrome Dev if we want to have an "experimental" setup for Chrome. This is an umbrella flag for a (moving) set of experimental features, and is enabled by default in Blink when we run layout tests.

The rest of the special flags (found in VirtualTestSuite) might not even be suitable for WPT to test (they might not be "mature" enough to be considered as experimental features, or they might be test-only and not available outside content_shell, etc. I'm actually not very familiar with feature control; @foolip is my understanding correct?)

@foolip
Copy link
Member

foolip commented Apr 12, 2018

@jgraham, that understanding is correct AFAICT. Note that @jugglinmike's OP didn't say anything about pinning the browser version, I threw that into the mix.

I agree that pinning the version of experimental browsers doesn't make a whole lot of sense, even if we could occasionally be confused when we thought the versions were the same but they weren't.

This business of overlapping the tested commits for stable version N and N+1 we can leave aside, I'd like to have it, but the most important bit here is really getting Travis and wpt.fyi in sync, and Travis doesn't run stable.

@Hexcles
Copy link
Member

Hexcles commented Apr 12, 2018

A few minor corrections regarding @jgraham 's latest comment:

  • Chrome CI always sets --enable-experimental-web-platform-features for all layout tests including WPT
  • VirtualTestSuite doesn't apply specific flags to certain directories. Instead, it creates a virtual suite from a certain directory by running the same tests with additional flags. Therefore, the directory will be tested once without any special flag (but still with --enable-experimental-web-platform-features), and once with the additional flags. AFAIK, this feature is used for the two purposes I mentioned above (testing super early-stage features, and enabling special test mocks).

@foolip
Copy link
Member

foolip commented Apr 12, 2018

Pointing to VirtualTestSuite may have been misleading, I'm not suggesting we maintain a per-directory mapping of command line arguments, that's was just a place involving some interesting command line arguments that I was aware of.

@Hexcles
Copy link
Member

Hexcles commented Apr 12, 2018

And my point is, based on my understanding, almost none of the flags in VirtualTestSuite is intended to be used outside content_shell (either too premature, or content_shell testing-only). Yet I have to admit I'm not terribly familiar with those flags (or the overall philosophy).

@foolip
Copy link
Member

foolip commented Apr 12, 2018

FWIW, the last example in http://web-platform-tests.org/running-tests/chrome.html is what I think we need for Chrome: ./wpt run --binary `which google-chrome-unstable` --binary-arg=--enable-experimental-web-platform-features chrome

(Maybe not the which google-chrome-unstable bit.)

@jugglinmike
Copy link
Contributor Author

We've also had a request for --enable-blink-features=MojoJS,MojoJSTest. I'm following up there to see if that is subsumed by --enable-experimental-web-platform-features

@jgraham
Copy link
Contributor

jgraham commented Apr 13, 2018

So it seems like the high impact items here might just be

  • Version detect the Chrome binary and run dev/canary versions with the --enable-experimental-web-platform-features flag
  • Produce a minimal version of the Firefox prefs file with no feature additions (in Mozilla central) and use that for stable Firefox releases.

If we need more prefs for specific tests (e.g. to enable mocks), we could consider just adding some expectation ini files that enable those prefs for those tests similar to how we do it in the gecko infrastructure. That's not ideal because it means maintaining those values (but maybe there's some way to scrape it from vendors if the same values are being applied in their CI).

@jugglinmike
Copy link
Contributor Author

I reported gh-10823 as a separate feature because that use case (enabling automation) seemed distinct from this one (enabling experimental functionality). That may have been wrong, though. Are there any use cases where one would want one of those two things but not the other?

For example, would someone testing Chrome want to specify the --autoplay-policy=no-user-gesture-required option, but not the --use-fake-device-for-media-stream option?

If not, then the WPT CLI only needs to offer a single interface for enabling both kinds of features. If that seems right, then I'll close the second issue.

@lukebjerring
Copy link
Contributor

Chiming in late to echo an offline conversation I've had with @Hexcles

I'd like to do regular runs of both (with and without experimental flags) on the unstable binaries, potentially with significantly less frequency on the runs which don't have the experimental feature flags. There's regression detection value in comparing latest stable vs unstable, in an apples to apples manner.

@foolip
Copy link
Member

foolip commented May 17, 2018

Yeah, that would be nice. At the same cadence as stable would make sense, I guess weekly?

@foolip
Copy link
Member

foolip commented Oct 17, 2018

I've recently discovered we already have --channel=experimental which combined with --install-browser should do the right thing. @jugglinmike, is there still something in this issue you'd like to pursue?

@Hexcles
Copy link
Member

Hexcles commented Oct 17, 2018

@foolip FWIW, right now we don't differentiate between experimental and dev channels. Both will lead to extra flags/perfs in browsers that enable experimental features.

@foolip
Copy link
Member

foolip commented Oct 17, 2018

Yep. Another bug in the code surrounding this is that it doesn't know about Chrome Canary existing on macOS and Windows, which arguably would be the right thing to use for --channel=experimental on those platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants