-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taskcluster tasks sometimes hang at the end #10842
Comments
I reviewed TaskCluster's behavior for all the builds between July 6 and July 12:
As noted, not all of those hang at the end. Many pause in the middle of test execution. After some trial-and-error, I was able to reproduce the problem locally using a generic Ubuntu 16.04 container and zero additional dependencies:
That process will eventually stall. If you run the command with I wrote up a very professional and impressive bug report for Docker, but I just now found the problem being discussed in the context of the Moby project, so you'll have to take my word for it.
And the response is relevant:
Fortunately, someone has come through with a patch to containerd/console (yet another component of the project--Docker is complex!), so we can hope that this will resolve itself in time. I don't know what the time frame is, though. The change will need to be released in Docker, and TaskCluster will need to migrate to the new version. That's a lot of moving parts. In the mean time, we could experiment with throttling standard output and standard error. What do you think @jgraham? |
Here's some discussion about the issue on the TaskCluster bug tracker: https://bugzilla.mozilla.org/show_bug.cgi?id=1457694 |
The bug fix has been published in Docker version 18.06, and the folks at TaskCluster have updated to that release. With that change live in TaskCluster, we just need to wait and see if stability improves on WPT's |
It's been a week since the folks at TaskCluster deployed the fix, and of the 56 TaskCluster builds which have run, zero have failed. Although this evidence is not as conclusive as, say, a passing unit test, the nature of the problem makes it difficult to be more precise. @jgraham are you satisfied by these results? Do you think it's fair to call this issue resolved? |
I am very pleased to do so. |
It looks like we have tasks that do all the work but don't exit properly. I guess it's some race condition during shutdown that may be exacerbated by printing a lot of logs about unexpected failures at the end. For example:
https://tools.taskcluster.net/groups/HbE1r6PHR-K6dVfjVBCyyg/tasks/EyHx_c0rTKeGvgSTyLeJBw/runs/0/logs/public%2Flogs%2Flive.log
The text was updated successfully, but these errors were encountered: