-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: High rate of "We stopped hearing from agent" errors for web-platform-tests. #4313
Comments
Hi @jgraham, thank you for the feedback, based on the error message, the issue is not related to the agent itself, but to the ms-hosted pool. Could you please create the issue in the runner-images repository? Also, to speed up the process, you could create a ticket on dev community? |
Hi @jgraham did you manage to get any resolution for this? Windows-2022 we are having same issue. It's very annoying because it's inconsistent and a re-run doesn't always fix it. |
Also experiencing this issue with the Microsoft Hosted Ubuntu Pools (I've tried them all) |
We had this problem occuring for several months, and it was fixed by simply turning off auto-updates for agents. I caught the agent trying to download and install a previous version (the one packaged with its corresponding Azure DevOps version). It seems there is an undocumented behaviour about failing tasks that triggers a backup if the agent was downloaded through another source than Azure (like Github). Hope it fixes your issue too ;) |
How do you manage to turn off auto-updates on Azure DevOps Server 2022? |
What happened?
Since approximately May 16th, we've been experiencing a high failure rate for web-platform-tests jobs running on macOS 13. This appears to be an infrastructure issue as we get a message indicating that the agent stopped responding. This affects some, but not all jobs, and it appears to be random within set of jobs running similar workloads (chunks of the testsuite) on macOS. It doesn't appear to be a specific part of the workload (e.g. a specific testcase).
One of the first affected builds is: https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=100660. A recent one is https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=101901
Manually rerunning the failed jobs does work (but some jobs require multiple reruns, since the problem can also happen during the rerun)
We've tried to resolve the problem in the following ways:
(cc @gsnedders who did most of the diagnosis work to date)
web-platform-tests/wpt#40085 is the corresponding wpt repository issue
Versions
macOS-13
Environment type (Please select at least one enviroment where you face this issue)
Azure DevOps Server type
dev.azure.com (formerly visualstudio.com)
Azure DevOps Server Version (if applicable)
No response
Operation system
No response
Version controll system
No response
Relevant log output
##[error]We stopped hearing from agent Azure Pipelines 11. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610 Pool: Azure Pipelines
The text was updated successfully, but these errors were encountered: