Triage Safari differences between Azure Pipelines and Buildbot setup #646

foolip · 2019-01-31T11:22:51Z

With web-platform-tests/wpt#14836 we've begun running Safari Technology Preview on Azure Pipelines, currently every 12 hours.

The first aligned run between Azure Pipelines and Buildbot has now occured:
https://wpt.fyi/results/?diff&run_id=5992310068215808&run_id=6012154763280384

The latest aligned run at any time will be:
https://wpt.fyi/results/?label=master&label=experimental&product=safari%5Bbuildbot%5D&product=safari%5Bazure%5D&aligned&diff

And all of the runs that could be compared are:
https://wpt.fyi/runs?label=master&label=experimental&max-count=100&product=safari%5Bbuildbot%5D&product=safari%5Bazure%5D&aligned

There are some differences in results, and the differences in webdriver/ is the largest.

We should understand the differences before committing to relying on only Azure Pipelines.

@mariestaver we'll to discuss when to do this work, vs. the cost of keeping Buildbot running. Waiting has its benefits because it gives us more data to compare, which could help filter out flakiness.

foolip · 2019-01-31T11:24:53Z

One important difference is that Azure Pipelines runs 4 shards with --chunk-type hash. That combined with --no-restart-on-unexpected will probably result in some differences that are tricky to understand.

jugglinmike · 2019-01-31T17:15:46Z

Thanks for calling that out. Past experience has demonstrated that WPT isn't nearly hygienic enough to rule out test interaction. The discrepancies between collections for this revision seem tractable, but I think any investigation will be much more efficient if it starts from datasets which control for this variation.

@foolip I can replicate that configuration in the Buildbot environment. Sound good to you?

jgraham · 2019-01-31T17:25:41Z

The webdriver differences look like we're getting the wrong version of SafariDriver.

foolip · 2019-02-01T23:06:08Z

@jugglinmike aligning the two setups makes sense, but if you'd rather change the Azure Pipelines setup we could do that. I sort expect the hash chunking to have some unwanted side effects, which would show up in this triaging.

gsnedders · 2019-02-02T11:24:16Z

The webdriver differences look like we're getting the wrong version of SafariDriver.

That's weird, given it's very much tied to a single version of Safari (and bundled with it). Given safaridriver starts a constant build, that's very weird.

foolip changed the title ~~Triage differences between Azure Pipelines and Buildbot setup~~ Triage Safari differences between Azure Pipelines and Buildbot setup Jan 31, 2019

jugglinmike mentioned this issue Feb 1, 2019

Switch from testing Edge 17 to 18 #647

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triage Safari differences between Azure Pipelines and Buildbot setup #646

Triage Safari differences between Azure Pipelines and Buildbot setup #646

foolip commented Jan 31, 2019

foolip commented Jan 31, 2019

jugglinmike commented Jan 31, 2019

jgraham commented Jan 31, 2019

foolip commented Feb 1, 2019

gsnedders commented Feb 2, 2019

Triage Safari differences between Azure Pipelines and Buildbot setup #646

Triage Safari differences between Azure Pipelines and Buildbot setup #646

Comments

foolip commented Jan 31, 2019

foolip commented Jan 31, 2019

jugglinmike commented Jan 31, 2019

jgraham commented Jan 31, 2019

foolip commented Feb 1, 2019

gsnedders commented Feb 2, 2019