5000+ runs have neither master, pr_base or pr_head labels #1175

foolip · 2019-02-24T10:29:59Z

When the master label was introduced some old runs were labeled, but it seems not enough.

In https://github.com/foolip/ad-hoc-wpt-results-analysis/blob/master/audit.js I listed runs with neither master, pr_base or pr_head labels, and there were more than 5000. Some probably aren't master runs, but some definitely are:
https://wpt.fyi/results/?run_id=5632908932939776

Some are probably from commits on master but incomplete, so fixing this might take some work.

Hexcles · 2019-02-24T19:06:09Z

This was how old runs were retroactively tagged: https://github.com/web-platform-tests/data-migration/blob/master/tagger/master.go

We used the commit log of WPT as the ground truth. I think the logic is sound, but there might be some gaps because of timing (e.g. tagged the old runs before the prod release, out-of-date local WPT checkout). However, the linked example is a very old run, so there might be bugs in the migration code or it wasn't run until completion.

foolip · 2019-03-04T10:57:47Z

@Hexcles, in #1195 (comment) I wrote a script to find some mislabeled Safari runs. Those would come up in an analysis like this.

I feel like all runs should have one of the 3 known labels, and if anything useful remains perhaps we should come up with a label for those too so that it's easier to be sure that nothing's gone wrong like this.

foolip · 2019-04-01T14:09:44Z

This is a problem in practice now. In trying to dust off my scripts for measuring results and interop over time, I see that some of the aligned SHAs I used that time around aren't found now because some of the runs don't have the master label.

https://wpt.fyi/results/?sha=a516e7edca&label=stable is the earliest such run, where the Firefox run from Taskcluster is missing the master label.

foolip · 2019-04-01T14:11:07Z

Simply not requiring the master label would probably work, but there might still be one-off runs with stable browsers that someone has done that then influence the results.

Hexcles · 2019-04-04T19:53:27Z

Backfilling labels using web-platform-tests/data-migration@ea07520

Hexcles · 2019-04-04T19:59:14Z

Note that the backfiller is being tested on staging at the moment; prod isn't affected yet.

Hexcles · 2019-04-05T18:00:10Z

The heuristics aren't strong enough: the script is incorrectly labelling some supposedly pr_base runs as master. I'll use the test duration as another signal: only consider runs >10min (the rationale is that if a PR run can't take >10min, otherwise the stability check would time out). web-platform-tests/data-migration@423adc8

Hexcles · 2019-04-05T18:01:26Z

The new heuristics seem to work. I don't see any false positives, and the possibility of false negatives (true master runs taking <10min) is low. ~3000 runs are retroactively labelled as master on prod.

foolip · 2019-04-05T20:51:29Z

How much is left with none of the three labels, and is the origin of all of those runs clear?

Hexcles · 2019-04-05T21:04:03Z

We still have a couple of thousand, most of which are from Taskcluster and Azure, and I'm fairly certain they are pr_head or pr_base runs based on the short duration (test execution time should be trustworthy on these two runners). There was a short period when we did not add any labels because of a bug in the webhook.

I don't think backfilling the pr_head and pr_base labels has high value. WDYT?

foolip · 2019-04-05T22:16:00Z

Nah, adding those labels itself doesn't seem valuable, I was more curious about what would remain. I'd expect at most 100 or so runs triggered by folks trying stuff out by tweaking CI config, but if it's more I'd wonder if we've missed some master runs still.

stephenmcgruer · 2019-10-29T11:59:13Z

@foolip This seems to be the issue you mentioned regarding old runs not marked as master.

@Hexcles When digging up foolip's https://github.com/foolip/ad-hoc-wpt-results-analysis scripts, I found that we didn't seem to have any runs before 2018-06 coming in as 'master'; which included many in the aligned-shas.txt file (which I was using to try and align my new graphs with his old ones).

https://wpt.fyi/api/runs?sha=b0c75cd8857f9f9596a744fac65c3be5e526aede is an example; I notice that the time_end is totally mangled (0001-01-01T00:00:00Z), maybe thats why our earlier attempts using heuristics missed these?

I'm not sure if we can find a different way to backdate into 2017, but re-opening to discuss.

stephenmcgruer · 2019-10-29T11:59:44Z

Note that this probably causes #1537

Hexcles · 2019-10-29T19:24:24Z

Stephen, you are correct. This is a problem caused by the additional heuristic "only consider runs >10min". There was no false positive, but a lot of false negative because back in 2017 we did not record timestamps in wptreport JSON. The TimeStart field for those runs was backfilled with CreatedAt and TimeEnd was left empty (the time you're seeing is the zero value of time.Time in Go).

I could try a more robust heuristic (only apply the >10min rule for runs created after we set up pr_base ) if you think those historical runs are valuable -- note that the infra was not very reliable back then and we went through some massive renaming in the WPT repo as well (especially css).

foolip · 2019-10-29T21:13:59Z

Given that we have manifests now for all master commits since the beginning, can we just check the results against the manifests and label ones that are complete enough? Or just pick a constant number of results to expect?

Hexcles · 2019-10-29T21:26:07Z

Yes, but either way would require us to download the full JSON and deserialize it. It'd be very slow and I probably need to write some goroutine magics so that it won't take weeks (currently the labeller runs singly).

foolip · 2019-10-29T21:29:38Z

How about if I just ingest the runs into my wpt-results repo and from there just sort them by number of results? It'd still be slow, but almost all of the code exists and would just have to churn away for a few days.

This should include 5000+ additional runs: web-platform-tests/wpt.fyi#1175

foolip · 2019-10-30T16:39:51Z

There are a lot less than 5000 of these runs now. After Ecosystem-Infra/wpt-results-analysis@b41047e and some local edits I see this output counting the runs in question:

Iterated 729 chrome runs
Iterated 136 edge runs
Iterated 715 firefox runs
Iterated 347 safari runs
Iterated 4 webkitgtk runs
Iterated 1931 runs in total

stephenmcgruer · 2019-10-30T17:39:34Z

This has bit me for some analysis work I've been doing (before 2018-06, I basically have to drop the 'master' tag to get runs in). I think we should fix it, but I'm happy to own doing so since I am the one mainly impacted. (And @Hexcles has enough on their plate!)

stephenmcgruer · 2019-11-24T21:51:23Z

I've begun looking at this. Thanks to ad-hoc-wpt-results-analysis, I have a promising approach that revolves around checking the results diff for the run. For example:

run/321670007/results has 36208 insertions
...
run/324880027/results has 62 insertions

What is surprising me, however, is how new a lot of these runs are. Both of the above runs are from this month! There's other weird stuff too; run 321670007 has a created_at that appears to be after its time_end... (Perhaps I just don't know what created_at means.)

For now, however, I'm going to look at the 2017 and 2018 runs (since we know there should be missing master runs from then). Plan is to generate run ids via https://github.com/foolip/ad-hoc-wpt-results-analysis/pull/14/files, then use a datastore fixup to add the master label.

Hexcles · 2019-11-24T23:36:29Z

Created at is the time the run is created on wpt.fyi, which should be after time end, which is when the run finishes.

…

On Sun., Nov. 24, 2019, 16:51 Stephen McGruer, ***@***.***> wrote: I've begun looking at this. Thanks to ad-hoc-wpt-results-analysis, I have a promising approach that revolves around checking the results diff for the run. For example: run/321670007/results has 36208 insertions ... run/324880027/results has 62 insertions What is surprising me, however, is how *new* a lot of these runs are. Both of the above runs are from this month! There's other weird stuff too; run 321670007 has a created_at that appears to be after its time_end... (Perhaps I just don't know what created_at means.) For now, however, I'm going to look at the 2017 and 2018 runs (since we know there should be missing master runs from then). Plan is to generate run ids via https://github.com/foolip/ad-hoc-wpt-results-analysis/pull/14/files, then use a datastore fixup to add the master label. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1175?email_source=notifications&email_token=AAK6ZDAKFCR2CRNSEPYFCLLQVLZNXA5CNFSM4GZZ2XFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAVZII#issuecomment-557931681>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAK6ZDGQPN3BRL4B3S3YNPDQVLZNXANCNFSM4GZZ2XFA> .

stephenmcgruer · 2019-11-24T23:45:04Z

Ah, that would make sense then.

We should figure out how we're getting recent results with no pr_base/pr_head/master label though, if possible.

stephenmcgruer · 2019-12-03T23:08:34Z

2017 runs were just tagged in Ecosystem-Infra/wpt-results-analysis#14; there were 251 of them.

foolip · 2019-12-04T14:38:02Z

Does that resolve this issue?

stephenmcgruer · 2019-12-04T14:44:33Z

Sorry, no, should have been clearer. I am now working on 2018 runs and there are a lot more of them :).

stephenmcgruer · 2019-12-05T18:03:25Z

Jan-June 2018 have been tagged in Ecosystem-Infra/wpt-results-analysis#17 (comment); there were 1162 of them.

Remaining work is just to check what is leftover. The expectation is that we will be done to <100 runs that we don't care about.

stephenmcgruer · 2019-12-05T18:12:20Z

Well, I was wrong.

$ wc -l tagless-runs.txt 
948 tagless-runs.txt
$ grep 2019-01 tagless-runs.txt | wc -l
603

Any idea what happened in January @Hexcles ?

Hexcles · 2019-12-05T18:15:22Z

Are those SHAs actually on the master branch?

stephenmcgruer · 2019-12-05T18:21:35Z

Have not checked them all for being in the repo in general, but they mostly do not appear to be in the list produced by git tag --list 'merge_pr_*' --format="%(objectname:short)" at least. I get 89 hits against that list from the 948 runs total.

stephenmcgruer · 2019-12-05T18:28:01Z

They're all junk. One has 30k tests, the rest have <50 (many have 1-10).

2019-01-runs-log.txt

Hexcles · 2019-12-05T18:36:00Z

I didn't find any major fire during that time. And I'd say since they're not on master we can safely ignore them, in which case, this issue is officially fixed (again)!

Thanks so much, @stephenmcgruer !

foolip · 2019-12-05T20:00:05Z

Yay, thanks @stephenmcgruer!

Now, I wonder if this resulted in many more aligned runs for historical analysis?

stephenmcgruer · 2019-12-10T21:32:17Z

Hrm. I expected it to have a substantial impact, but I'm only seeing some more data for stable (2017-08 to 2017-10 has suddenly appeared, and 2018-03 to 2018-06) and no new data for experimental (still starts 2019-01).

I'm going to do some random sampling of the runs I labelled master and try to figure out what is happening.

stephenmcgruer · 2019-12-10T21:56:30Z

I remembered I am a programmer.

$  python check_labels.py 2017-runs-tagged-master.txt
251 total
251 stable
0 experimental
0 other

python check_labels.py 2018-runs-tagged-master.txt
1162 total
694 stable
468 experimental
0 other

I guess we just didn't have STP in 2018; https://wpt.fyi/api/runs?label=experimental&product=safari&from=2018-01-01&to=2018-12-31 (not even including the master label here).

foolip · 2019-12-11T14:31:06Z

web-platform-tests/results-collection#112 was the original tracking issue for that and I think there should be some STP runs in 2018.

stephenmcgruer · 2019-12-11T15:03:04Z

Yes, web-platform-tests/results-collection#112 (comment) implies there should be runs from mid-August 2018 onwards...

stephenmcgruer · 2019-12-11T15:06:11Z

Flipping through https://wpt.fyi/api/runs?labels=safari&from=2018-08-15&to=2018-09-15 from month to month, however, I'm seeing no traces of them. There are runs from Safari 11 and 12, but both are marked stable.

Interestingly, in that thread you actually linked a sha and said there was a TP run for it, and there isn't any longer... https://wpt.fyi/results/?sha=eea0b54014&label=experimental

I wonder if we purged runs at some point for some reason? @Hexcles any ideas?

foolip · 2019-12-11T19:53:31Z

https://wpt.fyi/results/?sha=eea0b54014&max-count=10 is suspicious. It shows Safari 11.1 and 12.1 both as stable.

Before safaridriver --version existed the version reported was 12.1, so I'm pretty sure that 12.1 run and probably many others were actually STP. However, figuring out which version of STP they were would at this point be very hard. We could still relabel them as experimental, however, if we can find a rule that slices them out and doesn't capture any real 12.1 stable runs.

Hexcles · 2019-12-11T20:06:31Z

I concur.

According to https://en.wikipedia.org/wiki/Safari_version_history#Safari_12 , Safari 12.1 wasn't released until March 2019, so 12.1 back in 2018 was definitely a technology preview.

Most of the history can be found in web-platform-tests/wpt#14465. Tl;dr: we went back and forth on how (STP has, or at least had, both a future Safari version and a STP version, e.g. Safari 12.1 Technology Preview 70) and where (we had various parsing and overriding in wpt CLI, results-collection, and wpt.fyi) to best handle and represent STP versions, complicated by the fact there was no easy way to extract the version from CLI.

I think we might have accidentally labelled those runs as stable.

foolip · 2019-12-11T20:17:51Z

web-platform-tests/results-collection#617 has a bunch of history too. It looks like we fixed the version problem before Safari 12.1 was released, so just listing all 12.1 runs, checking that there's a gap and relabeling the first group of runs should be correct.

stephenmcgruer · 2019-12-11T20:48:21Z

Ack, will look at doing that. Re-opening to track (though the issue is somewhat misleadingly named now, there's enough history here and this shouldn't be open long). Thanks for all the history folks!

stephenmcgruer · 2019-12-30T18:18:35Z

Right, so listing all Safari stable runs whose browser version starts with 12.1, I see:

2019-10-08T00:02:10.175Z
...
2019-04-10T18:19:59.884Z
2019-04-10T12:16:44.96Z
2019-04-10T06:16:52.8Z
2019-04-09T12:17:00.64Z
2019-01-05T00:12:16.423Z
2019-01-04T00:12:18.446Z
2019-01-03T00:12:13.941Z
...
2018-08-07T12:34:25.644Z

My plan is to change every run whose browser_version starts with 12.1 and whose start_time is before 2019-02 from stable to experimental. Lmk if anyone disagrees.

See discussion in web-platform-tests/wpt.fyi#1175 (comment)

stephenmcgruer · 2020-01-02T22:54:27Z

Script has been run on staging + prod (logs on the PR). https://wpt.fyi/api/runs?label=experimental&product=safari&from=2018-01-01&to=2018-12-31 now shows a bunch of runs, let's see what this has done to the available data for experimental

stephenmcgruer · 2020-01-02T23:17:05Z

We now have experimental data from 2018-08 to 2019-01. Probably as good as we're going to get :)

foolip · 2020-01-03T12:32:57Z

Running the audit.js script again, there are now 955 runs remaining, for 558 unique commits. Looks like only 77 of those commits are merge_pr_* commits. That seems like a plausible number of runs that might have been triggered for various verification purposes, and puts a low ceiling to how many real and useful master runs could be among these.

Thanks @stephenmcgruer!

Hexcles · 2020-01-03T19:46:00Z

Great job, Stephen!

Hexcles self-assigned this Apr 4, 2019

Hexcles closed this as completed May 3, 2019

Hexcles mentioned this issue May 3, 2019

Safari runs from Azure Pipelines not labeled as master #1195

Closed

stephenmcgruer reopened this Oct 29, 2019

foolip mentioned this issue Oct 29, 2019

Salvage chrome-experimental and firefox-experimental runs #1540

Closed

foolip added a commit to Ecosystem-Infra/wpt-results-analysis that referenced this issue Oct 29, 2019

Process all runs except partial runs for PRs

75d2fb7

This should include 5000+ additional runs: web-platform-tests/wpt.fyi#1175

foolip assigned stephenmcgruer and unassigned Hexcles Oct 30, 2019

stephenmcgruer mentioned this issue Nov 24, 2019

Add tool to fetch aligned SHAs Ecosystem-Infra/wpt-results-analysis#13

Merged

Hexcles closed this as completed Dec 5, 2019

stephenmcgruer reopened this Dec 30, 2019

stephenmcgruer added a commit to web-platform-tests/data-migration that referenced this issue Dec 30, 2019

Script to re-label Safari 12.1 STP runs

a4b273f

See discussion in web-platform-tests/wpt.fyi#1175 (comment)

stephenmcgruer mentioned this issue Dec 30, 2019

Script to re-label Safari 12.1 STP runs web-platform-tests/data-migration#18

Closed

stephenmcgruer closed this as completed Jan 2, 2020

5000+ runs have neither master, pr_base or pr_head labels #1175

5000+ runs have neither master, pr_base or pr_head labels #1175

Comments

foolip commented Feb 24, 2019

Hexcles commented Feb 24, 2019

foolip commented Mar 4, 2019

foolip commented Apr 1, 2019

foolip commented Apr 1, 2019

Hexcles commented Apr 4, 2019

Hexcles commented Apr 4, 2019

Hexcles commented Apr 5, 2019

Hexcles commented Apr 5, 2019

foolip commented Apr 5, 2019

Hexcles commented Apr 5, 2019

foolip commented Apr 5, 2019

stephenmcgruer commented Oct 29, 2019

stephenmcgruer commented Oct 29, 2019

Hexcles commented Oct 29, 2019

foolip commented Oct 29, 2019

Hexcles commented Oct 29, 2019

foolip commented Oct 29, 2019 • edited Loading

foolip commented Oct 30, 2019

stephenmcgruer commented Oct 30, 2019

stephenmcgruer commented Nov 24, 2019

Hexcles commented Nov 24, 2019 via email

stephenmcgruer commented Nov 24, 2019

stephenmcgruer commented Dec 3, 2019

foolip commented Dec 4, 2019

stephenmcgruer commented Dec 4, 2019

stephenmcgruer commented Dec 5, 2019

stephenmcgruer commented Dec 5, 2019 • edited Loading

Hexcles commented Dec 5, 2019

stephenmcgruer commented Dec 5, 2019

stephenmcgruer commented Dec 5, 2019

Hexcles commented Dec 5, 2019

foolip commented Dec 5, 2019

stephenmcgruer commented Dec 10, 2019

stephenmcgruer commented Dec 10, 2019

foolip commented Dec 11, 2019

stephenmcgruer commented Dec 11, 2019

stephenmcgruer commented Dec 11, 2019

foolip commented Dec 11, 2019

Hexcles commented Dec 11, 2019

foolip commented Dec 11, 2019

stephenmcgruer commented Dec 11, 2019

stephenmcgruer commented Dec 30, 2019

stephenmcgruer commented Jan 2, 2020

stephenmcgruer commented Jan 2, 2020

foolip commented Jan 3, 2020

Hexcles commented Jan 3, 2020

foolip commented Oct 29, 2019 •

edited

Loading

stephenmcgruer commented Dec 5, 2019 •

edited

Loading