Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ContainerD deadlock present in 1.7.15, fixed in 1.7.17 or newer #4426

Closed
ben-childs-docusign opened this issue Jul 22, 2024 · 4 comments

Comments

@ben-childs-docusign
Copy link

Describe the bug

We are seeing our AKS nodes running 1,29.5 go into a not ready state and looking at logs it appears that containerd is hanging and becoming non responsive.

There are 2 deadlock bugs fixed in containerd 1.7.16 and 1.7.17
containerd/ttrpc#168
containerd/nri#79

When can we expect containerd to be upgraded to 1.7.17 or newer to address these deadlock issues?

To Reproduce

We are seeing this issue most reliably when we enable istio native sidecars [https://learn.microsoft.com/en-us/azure/aks/istio-native-sidecar] on our test cluster where we have a large number of cron jobs running to execute various tests. This is blocking us from adopting istio native sidecars in any production environments.

Expected behavior

Our cluster nodes remain in a ready state

Screenshots

unnamed

Environment (please complete the following information):

  • CLI Version : N/A
  • Kubernetes version: 1.29.5
  • CLI Extension version [e.g. 1.7.5] if applicable: N/A
  • Browser [e.g. chrome, safari] is applicable: N/A

Additional context
Add any other context about the problem here.

@UtheMan
Copy link

UtheMan commented Jul 25, 2024

We are working on bumping the containerd version to .20 patch version. It will be available with one of the upcoming node image versions. I will share an update in this thread once the roll out starts. Thank you for bringing this up.

@UtheMan
Copy link

UtheMan commented Aug 7, 2024

We now have a new node image version releasing which has containerd 1.7.20. The node image version with updated containerd is 202407.29.0. You can track the progress of the release here (AKS Node Images tab on the left side). It will take a couple of weeks before this version reaches all the regions. Closing the issue for now - feel free to re-open as needed.

@UtheMan UtheMan closed this as completed Aug 7, 2024
@ben-childs-docusign
Copy link
Author

ben-childs-docusign commented Aug 9, 2024

Thank you we are testing the fixes now. FYI we also tested the azure linux image which has containerd 1.6.20 and that also has a deadlock bug fixed in 1.6.25
containerd/containerd#9210

Edit: Actually azure linux latest images has containerd 1.6.26 so we are continuing to test with azurelinux.

@ben-childs-docusign
Copy link
Author

ben-childs-docusign commented Aug 9, 2024

@UtheMan

Unfortunately it looks like deadlock issue is still happening for us even with the new version of containerd. We will continue investigating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants