-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WPT CLI Feature proposal: --experimental
#10452
Comments
How does this work for Chrome developers and Chrome CI? My initial reaction is that I'm reluctant to maintain these internal lists that will need constant maintainance (every time a new feature is added behind a flag). For unstable/dev channel in particular it seems problematic, but that's by far the most interesting version. |
Where there are command line flags, they are maintained in the Chromium source code itself, for example: Some of Chrome's CI infra isn't in the Chromium repo, of course, but I'd actually be surprised if those are the bits that directly invoke the chrome binary. (Aside: wpt is run using the content_shell binary, but it doesn't matter in this context.) @Hexcles, do you have a clearer picture? |
For the proposal itself, I think we're going to eventually also care about upgrading stable versions of browsers in a more principled manner, and crucially we may want some overlap in the wpt revisions tested by the old and new version, so that they may be compared directly without "test material" noise. It's a different use case from getting wpt.fyi and Travis on the same page, but the solution must also involve pinning versions of browsers and drivers someone, but can't be tracked in the main repo if we want overlap, at least not without an extra layer of... something. So... something like a configuration file that can be provided to @jugglinmike if that works or could be made to work, how do you think such ini files would have to be maintained for this to work? And given a browser and driver version, where would |
Worth noting is that getting the command line arguments in sync could be done without solving the versioning problems. |
For stable versions we shouldn't really be passing in any flags apart from those required to run tests. I appreciate that gecko isn't quite there yet since there are some feature prefs in the So am I right in thinking that Chrome CI only invokes experimenal features on a per-test (directory) basis, and there isn't a general concept of "the set of experimental features that are suitable for use in the dev channel"? |
@jgraham CLs (~PRs) are tested before landing and then there's the https://ci.chromium.org/p/chromium waterfall builders, but there's no testing that happens using binaries from the dev channel, AFAIK, those are already released and deemed to have passed testing. So no, I don't think we maintain configurations or command line arguments anywhere outside of the chromium src repo itself, or at least I can't see why we would. I imagine the situation is very similar for Firefox? |
But this isn't related to stable versions vs. nightlies at all, really, as far as possible I think we'll want the configuration to be the same. The only exception we're deliberately going to make it to enable experimental web platform features for Chrome Dev and disable it for stable, so that people can see the results of things not yet shipped, and point to them in blink intent threads. There I don't think it makes sense to do the same on stable, because we want that to stay as close as possible to the world that web developers live in. Do you think the same makes sense for Firefox stable vs. nightly? If not, tell @jugglinmike, because the work is happening now :) |
I think getting these details right is important for WPT. The wpt.fyi project has received a few special handling instructions from Chromium developers, e.g. web-platform-tests/results-collection#81 and web-platform-tests/results-collection#125. The Chromium team relies on their own infrastructure for validating WPT patches, so I suppose that's why they're satisfied with inaccurate results in the WPT CI and why they're only reporting problems with the more publicly-visible results. However, this inconsistency tends to subvert WPT's stability checks. Tests that might be unstable in a properly-configured Chrome would present as stable (though failing) in a misconfigured instance. People contributing from other contexts (whether from another browser's source tree or directly to WPT itself) would also benefit from an accurate view of how their tests hold up across implementations. By including the configuration within WPT, we're aligning the interests of folks who want accurate published results with folks who want to contribute strong interoperable tests. Because of this, we might even expect that the most of the maintenance would be done by the browser vendors themselves. And the maintenance burden could be pretty light if we rely on pattern matching in the definition of this information. @foolip asked for an example, so here goes:
There would be no urgency behind the introduction of new flags. With a syntax like this, the only potential blocker for WPT would be when a browser removes a flag. This would be easy to diagnose, and the fix would amount to pinning a
Agree completely
James knows this well because he's been helping out :) |
OK, I'm finding this pretty hard to follow. Let my try to summarise my understanding
If that understanding is correct, I am still failing to see why we want to maintain our own list of prefs/flags or why we would maintain version mappings. Ignoring the Chromedriver stuff, it seems like we really want to be pulling the nightly configuration options from the vendor repo, and eliminating configuration options where possible for stable releases. Any prefs that should be indiscriminantly set for all tests in stable releases (like popup blocking) can be hardcoded in the wptrunner files. I guess there is a set of prefs that might change reasonably often but have to be applied to stable releases to run tests (e.g. the prefs to disable external network connections in Firefox). But I really really don't want to maintain those; if we are serious about e.g. getting rid of |
I know how these flags are handled in the testing setup, but I'm not sure if it's a good idea to do the same thing in WPT. As @jgraham said, it's a very tedious and laborious task, relying on the owner of the flags (or features). However, we can (and perhaps should) use the The rest of the special flags (found in |
@jgraham, that understanding is correct AFAICT. Note that @jugglinmike's OP didn't say anything about pinning the browser version, I threw that into the mix. I agree that pinning the version of experimental browsers doesn't make a whole lot of sense, even if we could occasionally be confused when we thought the versions were the same but they weren't. This business of overlapping the tested commits for stable version N and N+1 we can leave aside, I'd like to have it, but the most important bit here is really getting Travis and wpt.fyi in sync, and Travis doesn't run stable. |
A few minor corrections regarding @jgraham 's latest comment:
|
Pointing to |
And my point is, based on my understanding, almost none of the flags in |
FWIW, the last example in http://web-platform-tests.org/running-tests/chrome.html is what I think we need for Chrome: (Maybe not the |
We've also had a request for |
So it seems like the high impact items here might just be
If we need more prefs for specific tests (e.g. to enable mocks), we could consider just adding some expectation ini files that enable those prefs for those tests similar to how we do it in the gecko infrastructure. That's not ideal because it means maintaining those values (but maybe there's some way to scrape it from vendors if the same values are being applied in their CI). |
I reported gh-10823 as a separate feature because that use case (enabling automation) seemed distinct from this one (enabling experimental functionality). That may have been wrong, though. Are there any use cases where one would want one of those two things but not the other? For example, would someone testing Chrome want to specify the If not, then the WPT CLI only needs to offer a single interface for enabling both kinds of features. If that seems right, then I'll close the second issue. |
Chiming in late to echo an offline conversation I've had with @Hexcles I'd like to do regular runs of both (with and without experimental flags) on the unstable binaries, potentially with significantly less frequency on the runs which don't have the experimental feature flags. There's regression detection value in comparing latest stable vs unstable, in an apples to apples manner. |
Yeah, that would be nice. At the same cadence as stable would make sense, I guess weekly? |
I've recently discovered we already have |
@foolip FWIW, right now we don't differentiate between |
Yep. Another bug in the code surrounding this is that it doesn't know about Chrome Canary existing on macOS and Windows, which arguably would be the right thing to use for |
We're reporting WPT results for major browsers via https://wpt.fyi, and we'd like those results to match the results that project contributors experience (via local testing and via CI) as much as possible. To that end, we're discussing how to promote parity between environments. A new flag for the
wpt run
command might be a good way to do that.By way of example, executing the command
./wpt run --experimental chrome
would cause the CLI to:--binary
google-chrome --version
)mapping
--webdriver-binary
--binary-arg
think there are any at the moment)
ChromeDriver-version-to-Chromedriver-flags mapping
--webdriver-arg
mapping
All of those internally-managed mappings may seems like a new maintenance burden. The truth is, we've already had to address these issues. It's just that previously, we've simply deleted configuration from WPT as it became obsolete. With this implementation of the
--experimental
flag, we'd just be archiving that configuration rather than deleting it.Regardless, since the WPT CLI already offers automatic installation of a WebDriver implementation, step
3
could be framed as a bug fix rather than a feature. I've opened gh-10451 to discuss that separately.The text was updated successfully, but these errors were encountered: