-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases failing randomly with agent error "We stopped hearing from agent" #3994
Comments
Hi @BenH-Puregym! |
Hi @vmapetr , Yes that's right, I haven't seen this |
@BenH-Puregym so it seems the issue is not coming from the communication between the agent and AzDO this time. Could you please clarify - where the cleanup logs that you mentioned in the issue description are coming from? AFAIK, the AzureDevops itself is not managing the agent orchestration in the scope of complete agent deletion, so seems like you got this from the AKS scaler or KEDA, right? |
We have enrolled the cluster into new relic to get these logs. I have added the |
@BenH-Puregym From what it seems, the agent has some intermittent network issues, which is expected, but while working with the |
@vmapetr that's really great to know that it's a problem with ADO rather than us. |
Hi we am seeing this behavior on a AKS/KEDA setup as well. For us it's easily provoked by querying 20+ pipelines runs |
Hi @vmapetr, have you had much luck in finding the cause? We're still getting the error multiple times a day every so often. |
Hi @vmapetr has there been any progress on this? |
Any update on a fix for this? |
We also randomly experienced this issue when we used the self-hosted container app build agent. |
@vmapetr is this still being looked at? Any solution coming for this? |
This issue has had no activity in 180 days. Please comment if it is not actually stale |
This is not resolved yet |
I also encounter this randomly with ACA Container App Jobs. I thought that perhaps AKS would be the solution as there would be a lot more flexibility in troubleshooting this problem, but it seems as if the same would be the case on AKS as @lkt82 has pointed out. |
#4313 (comment) |
this issue still happens with self-hosted windows agent. I just logged my bug here: #4813 |
As per this issue which is now closed. #3855
This is still happening I'm afraid. Didn't happen for a couple weeks and now we've had multiple instances of it happen in the last week. I've been able to capture some more information this time.
So when a job finishes devops should delete the agent so that we have short lived agents... By the looks of the logs on this occurrence it seems when the job finished instead of deleting immediately like it does on other agents (i can see this by looking at the logs) it failed over and over again to delete the agent and eventually succeeds. In the meantime another job on a totally different pipeline decides to pick the same agent and of course once the agent is deleted we get the
We stopped hearing from agent
error on the new job.Logs from failed agent:
The text was updated successfully, but these errors were encountered: