-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Travis manifest_upload job occasionally fails to create merge_pr_* tags or upload manifest #10572
Comments
For #10445, because no tag was created, untagged-9aa6616bd18782ef19ad was created instead. tag_master.py would have generated the URL https://api.github.com/search/issues?q=type:pr+is:merged+repo:w3c/web-platform-tests+a5bb6d23ac757fcb5e8aef6fd7ed7587324982db which now returns one result. However, at the time, the "No PR found for master" line was reached, so it must have returned zero results. Possibly there is a race here, that the GitHub API isn't guaranteed to return the results of recent changes, or there was an internal problem which manifested as no results instead of a non-OK HTTP status code. |
Oh, that would explain it indeed. @jgraham, in this situation, it seems like rather than treat the PR as merged, it would be better to close it? In any case it's appropriate that no merge_pr_* tag was created, because nothing on master actually happened, or at least commit 0628e2e is no better a place to put the tag than some of the previous commits. |
Oh, if we ended up with an empty set of changes after rebase, it seems like everything is fine and we don't need to worry. What am I missing? |
@jgraham the problem is that the PR claims to be merged at commit 0628e2e, but that's just what master happened to be when the rebase happened. And https://github.com/w3c/web-platform-tests/pull/10543/files still lists changes that weren't made as part of that PR. Seems to me that the rebase must have resulted in no commits on the branch (not an empty commit), and that case could instead be handled by abandoning the PR, that would be less confusing to understand after the fact. |
It just happened again. untagged-f360b6391e175d60f813 was created for commit 4f39716 which is from #10663, and the build was https://travis-ci.org/w3c/web-platform-tests/builds/371818852. This time the log says:
Don't know why that is, maybe a GitHub API problem. |
This way, the Travis job will fail, and ci_manifest.sh will not continue to upload a manifest for an untagged-* tag. Related to #10572 but doesn't fix it, rather it'll make the Travis jobs fail when this problem happens, making it easier to notice.
#10673 is to make the Travis job fail if no merge_pr_* tag could be created, but it won't help with the problem just reported in #10572 (comment). |
I restarted https://travis-ci.org/w3c/web-platform-tests/jobs/371818853 but that didn't fix the problem, now it says "ERROR:__main__:No PR found for master" instead. |
I have deleted untagged-f360b6391e175d60f813 and created merge_pr_10663 instead. |
This way, the Travis job will fail, and ci_manifest.sh will not continue to upload a manifest for an untagged-* tag. Related to #10572 but doesn't fix it, rather it'll make the Travis jobs fail when this problem happens, making it easier to notice. Drive-by: use the sha instead of "master" in log output
#10673 worked, because we just had a broken build from master: https://travis-ci.org/w3c/web-platform-tests/jobs/372015780, which is for #10678. That PR doesn't show up in https://api.github.com/search/issues?q=type:pr+is:merged+repo:w3c/web-platform-tests+438f2694a95cb9c7f329ef89c1d37fa7140106f0 even now, so restarting it can't help. @annevk, I presume you did nothing unusual when merging that, just used the buttons? |
The PR merged immediately before it is #10675 and https://api.github.com/search/issues?q=type:pr+is:merged+repo:w3c/web-platform-tests+cc29a15b5c00d90149d8f60f96ca4f77a989a2d7 does find that, but I can't see what the difference could be. The temptation to blame everything odd on GitHub is strong, should we call this "search indexing is broken"? dunno |
Right, per https://github.com/w3c/web-platform-tests/commits/master it's been broken for a while. |
@annevk most of the red you see there will be because of #10653, https://travis-ci.org/w3c/web-platform-tests/builds is what I'm looking at to find broken Travis builds on master. |
But https://travis-ci.org/w3c/web-platform-tests/jobs/372023058 also failed. The run for #10673 and the four master pushes that followed passed, otherwise I'd strongly suspect that I just broke it with my changes. But https://api.github.com/search/issues?q=type:pr+is:merged+repo:w3c/web-platform-tests+420757ba95a3b59dbb5d66e23fa2f9380f20280b does return a PR, and now actually so does the search from #10572 (comment). So it definitely looks like there's a race here. It takes some amount of time after the PR is merged (updating master) until the API will find PRs by searching for commit, and the Travis job might run before that. |
I'll try to restart those two jobs now to see if they succeed or if the tags end up in the wrong place. |
Note that |
OK, job restart worked and created merge_pr_10678 + merge_pr_10671, I didn't do those manually. |
…y fails, a=testonly Automatic update from web-platform-testsUse a non-zero exit code if tag_master.py fails (#10673) This way, the Travis job will fail, and ci_manifest.sh will not continue to upload a manifest for an untagged-* tag. Related to web-platform-tests/wpt#10572 but doesn't fix it, rather it'll make the Travis jobs fail when this problem happens, making it easier to notice. Drive-by: use the sha instead of "master" in log output -- wpt-commits: 1e5d263997a3cd3f09471ab642774eb3bb150e84 wpt-pr: 10673
…s wrong in tag_master.py, a=testonly Automatic update from web-platform-testsLog the response body when something goes wrong in tag_master.py (#10681) Towards debugging web-platform-tests/wpt#10572. -- wpt-commits: ae34fb2b39f8b2cf9c3989a566ab938cde45dec0 wpt-pr: 10681
However, it's run long enough that I can tell that the latest trouble was just 15 days ago with #13187. That was tagged but there's no release, because Travis failed due to There are more like that. The most recent with no tags is #12824, where the Travis build associated with commit d7627ae (https://travis-ci.org/web-platform-tests/wpt/jobs/432494382) oddly thinks itself to be for commit 00c1fd9 and a PR build. This is weird, possibly the result of GitHub or Travis backend bugs. #12854 also got no tag, but for network error reasons: As far as my script has reached, there are problems with ~1.5% (36/2214) of merged PRs. That's too much, so I'll leave this open. I'll also manually add some missing tags. |
Ah, wait, I've made a mistake! #12824 wasn't targeting the master branch, so that explains the weirdness. My script should skip PRs that target anything but master. |
#11452 is like #10572 (comment) so that case still happens. |
I've run a tweaked version of my script over all PRs and have the output here: At total of merged 5377 PRs after 2017-07-01. Upload the manifest as MANIFEST-.json.gz was on 2017-10-30, so PRs before that don't have releases, just tags. There's a lot of manifest backfilling we'd could do, before and after that point. Good news is that only two PRs show up as missing tags, and that's #10543 and #11452, already explained in this issue. |
#10605 had no manifest, again because of a network error in the deploy step. It was recent, so I restarted https://travis-ci.org/web-platform-tests/wpt/jobs/440405376, but that then fails with "HTTP Error 422: Unprocessable Entity" because the tag already exists... |
Note that there are some tags for which releases exist but the assets (manifest file) do not. Those are merge_pr_10605 and merge_pr_10247, and I'll just delete the releases to put them in the same category as 'no release'. |
Alright. So far in 2018 it looks like we have 53 merge_pr_ tags that don't have the manifest assets. We could try to improve the reliability and maybe even backfill for all tagged commit. The question is if it matters. @lukebjerring, if this were perfectly reliable, would that allow us to do something cool on wpt.fyi that we currently cannot do? |
Briefly discussed this with Philip today. We need some better deploy mechanisms. We can keep pushing the limits of the CI system, by adding more retries to each step, and making sure later steps only happen when earlier steps succeed (e.g. make sure we have "set -e"). Or rather, it'd be good if there's a better system to define such workflow or pipeline. GitHub just announced Actions which looks interesting and could be simpler than square pegging the workflow into Travis, but I haven't looked into the details yet. |
There was a problem today again: #13638 |
A novel type of failure in https://travis-ci.org/web-platform-tests/wpt/jobs/454430391, where the deploy step failed with "No output has been received in the last 10m0s". I wonder if anyone is actually using Travis to deploy stuff at any significant scale :/ As @Hexcles says, this seems like a thing we should eventually migrate to the new Actions. |
I clicked some buttons as soon as Actions was announced, and the enrollment page now says "This account is already on the waitlist for the GitHub Actions Beta!" |
@foolip that's new for this particular task, but not really novel for Travis in general. Travis often has large-scale task timeouts like this, which seems like transient infra issues. In fact, a bunch of stability checks failed yesterday for the same reason. |
friendly ping. If this is to stay |
Yep, until this is fixed we won't be able to depend on having a manifest for wpt.fyi, which would be useful. @lukebjerring do you want to own this, or what's the priority of having the manifest available reliably? |
No, I don't want to own this. |
@lukebjerring glad to hear it's not a blocker for wpt.fyi :) This is continuing to fail intermittently on Travis CI, https://github.com/web-platform-tests/wpt/runs/50028082 being a recent case. I've played around with GitHub Actions enough to be confident we can do this using GitHub actions when it exits beta, but until then let's leave it alone. |
https://travis-ci.com/web-platform-tests/wpt/jobs/177122640 might be another case of this issue.
I restarted the job but it errored again. I don't know how to access the log of the first attempt. The PR was green. |
After a restart the logs of previous runs are gone I think. I've seen Error 422 before when working on this, but I don't remember when it happens. |
This was moved to GitHub Actions in #15781, so I'll close this. GitHub Actions is probably not 100% reliable either, but I'll file another issue when it becomes a problem. |
…y fails, a=testonly Automatic update from web-platform-testsUse a non-zero exit code if tag_master.py fails (#10673) This way, the Travis job will fail, and ci_manifest.sh will not continue to upload a manifest for an untagged-* tag. Related to web-platform-tests/wpt#10572 but doesn't fix it, rather it'll make the Travis jobs fail when this problem happens, making it easier to notice. Drive-by: use the sha instead of "master" in log output -- wpt-commits: 1e5d263997a3cd3f09471ab642774eb3bb150e84 wpt-pr: 10673 UltraBlame original commit: eacd312dd4c62c177e30f73d2e1c204cf8b7ce82
…s wrong in tag_master.py, a=testonly Automatic update from web-platform-testsLog the response body when something goes wrong in tag_master.py (#10681) Towards debugging web-platform-tests/wpt#10572. -- wpt-commits: ae34fb2b39f8b2cf9c3989a566ab938cde45dec0 wpt-pr: 10681 UltraBlame original commit: a21d7c9772895dd9881ee9d88e63a04f88517b18
…y fails, a=testonly Automatic update from web-platform-testsUse a non-zero exit code if tag_master.py fails (#10673) This way, the Travis job will fail, and ci_manifest.sh will not continue to upload a manifest for an untagged-* tag. Related to web-platform-tests/wpt#10572 but doesn't fix it, rather it'll make the Travis jobs fail when this problem happens, making it easier to notice. Drive-by: use the sha instead of "master" in log output -- wpt-commits: 1e5d263997a3cd3f09471ab642774eb3bb150e84 wpt-pr: 10673 UltraBlame original commit: eacd312dd4c62c177e30f73d2e1c204cf8b7ce82
…s wrong in tag_master.py, a=testonly Automatic update from web-platform-testsLog the response body when something goes wrong in tag_master.py (#10681) Towards debugging web-platform-tests/wpt#10572. -- wpt-commits: ae34fb2b39f8b2cf9c3989a566ab938cde45dec0 wpt-pr: 10681 UltraBlame original commit: a21d7c9772895dd9881ee9d88e63a04f88517b18
…y fails, a=testonly Automatic update from web-platform-testsUse a non-zero exit code if tag_master.py fails (#10673) This way, the Travis job will fail, and ci_manifest.sh will not continue to upload a manifest for an untagged-* tag. Related to web-platform-tests/wpt#10572 but doesn't fix it, rather it'll make the Travis jobs fail when this problem happens, making it easier to notice. Drive-by: use the sha instead of "master" in log output -- wpt-commits: 1e5d263997a3cd3f09471ab642774eb3bb150e84 wpt-pr: 10673 UltraBlame original commit: eacd312dd4c62c177e30f73d2e1c204cf8b7ce82
…s wrong in tag_master.py, a=testonly Automatic update from web-platform-testsLog the response body when something goes wrong in tag_master.py (#10681) Towards debugging web-platform-tests/wpt#10572. -- wpt-commits: ae34fb2b39f8b2cf9c3989a566ab938cde45dec0 wpt-pr: 10681 UltraBlame original commit: a21d7c9772895dd9881ee9d88e63a04f88517b18
Running a variation of the script in Ecosystem-Infra/ecosystem-infra-stats#6 (comment) to make sure that all PRs merged after 2017-07-01 have been tagged, in particular ones created well before that point, I noticed that some recently merged PRs have also not been tagged.
#10543 is the most recent. The PR and GitHub API says the PR was merged in commit 0628e2e, but that's #10124. There are two Travis jobs appear to have been run with that as checked out revision: https://travis-ci.org/w3c/web-platform-tests/jobs/369319256 and https://travis-ci.org/w3c/web-platform-tests/jobs/369365460. The second says "ERROR:__main__:Tag creation failed", but the first does not. At the very least something seems to have gone wrong on GitHub's side here.
#10445 was processed in https://travis-ci.org/w3c/web-platform-tests/jobs/365663222 but log says "ERROR:__main__:No PR found for master". Why?
For #10194 it seems like no build was triggered, in https://travis-ci.org/w3c/web-platform-tests/builds I see only the one before and after: https://travis-ci.org/w3c/web-platform-tests/builds/361106934 + https://travis-ci.org/w3c/web-platform-tests/builds/361263818
Those three are the only that are not tagged since #8005 was merged, the rest were merged before that, the latest of those being #4974.
The text was updated successfully, but these errors were encountered: