-
Notifications
You must be signed in to change notification settings - Fork 522
Unable to mount a volume on VMSS #3838
Comments
Can you do a describe on a pod that is failing with the disk attach?
Looks similar to some of the output in kubernetes/kubernetes#90749 and kubernetes/kubernetes#81266 @AndyZhang any thoughts? It seems the volume is not being cleaned properly. |
Thanks @jsturtevant , I work with the original bug poster. Here is an example of a pod description: `Name: sde-prometheus-server-0 Warning FailedAttachVolume 31m (x10 over 37m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-97b4d2ff-2e72-45cb-a9ee-84dcd0075214" : disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-97b4d2ff-2e72-45cb-a9ee-84dcd0075214) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_62), could not be attached to node(k8s-node-11577350-vmss00001o) This example is from a log from today:
I agree, this looks a lot like those two bugs you linked. Thanks! Kevin |
Also, I see the fix for this is in 1.15.4, we are just now upgrading to 1.15.12 so we are a little ways from that. Is there anything we can do manually to get the disks cleaned up to unblock us here? Any way we can manually detach the "dangling" disks to fix the state of the cluster? |
@andyzhangx is it safe to perform this manual operation for every dangling disk that is still detached? |
it looks like kubernetes/kubernetes#90749 did not make it into 1.15 because it is out of support: kubernetes/kubernetes#90800 |
@jackfrancis and @jsturtevant Thank you both for your responses. I did end up going ahead and taking down the kube-controller-manager pods and working through the logs to find all mentions of dangling disks and manually detaching them from their respective nodes. This actually mitigated the issue to allow kube-controller-manager to do it's thing and attach the disks where they needed to go and all pods on the cluster are now running |
@kebeckwith Thank you so much for reporting back and sharing your mitigation steps to help other users! :) |
here is the dangling error fix on VMSS, manually detach disk always works.
|
Unable to attach the volume to VMSS node. During upgrade using aks-engine v0.54.1 upgrading a cluster from 1.15.11 to 1.15.12:Controller Manager logs show disk attach is failing and causing the azure RP to throttle:
I0917 03:48:27.3403511attacher.go:89] Attach volume "/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-38a21327-af94-11ea-8b23-00224803698a" to instance "k8s-node-11577350-vmss00001c" failed with disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-38a21327-af94-11ea-8b23-00224803698a) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_47), could not be attached to node(k8s-node-11577350-vmss00001c)
E0917 03:48:27.3407211nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/azure-disk//subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-38a21327-af94-11ea-8b23-00224803698a podName: nodeName:}" failed. No retries permitted until 2020-09-1703:48:27.840627696 +0000 UTC m=+34.330095570 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume "pvc-38a21327-af94-11ea-8b23-00224803698a" (UniqueName: "kubernetes.io/azure-disk//subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-38a21327-af94-11ea-8b23-00224803698a") from node "k8s-node-11577350-vmss00001c" : disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-38a21327-af94-11ea-8b23-00224803698a) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_47), could not be attached to node(k8s-node-11577350-vmss00001c)"
I0917 03:48:27.3546771azure_controller_common.go:120] found dangling volume /subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-5bdf4361-a17d-11ea-b922-00224803698a attached to node k8s-node-11577350-vmss_62
I0917 03:48:27.3550911attacher.go:89] Attach volume "/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-5bdf4361-a17d-11ea-b922-00224803698a" to instance "k8s-node-11577350-vmss00001n" failed with disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-5bdf4361-a17d-11ea-b922-00224803698a) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_62), could not be attached to node(k8s-node-11577350-vmss00001n)
E0917 03:48:27.3574351nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/azure-disk//subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-5bdf4361-a17d-11ea-b922-00224803698a podName: nodeName:}" failed. No retries permitted until 2020-09-1703:48:27.857387257 +0000 UTC m=+34.346855231 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume "pvc-5bdf4361-a17d-11ea-b922-00224803698a" (UniqueName: "kubernetes.io/azure-disk//subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-5bdf4361-a17d-11ea-b922-00224803698a") from node "k8s-node-11577350-vmss00001n" : disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-5bdf4361-a17d-11ea-b922-00224803698a) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_62), could not be attached to node(k8s-node-11577350-vmss00001n)"
I0917 03:48:27.3599091azure_controller_common.go:120] found dangling volume /subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-facbf821-9ad0-4afe-a012-1ca2e2470628 attached to node k8s-node-11577350-vmss_62
I0917 03:48:27.3600471attacher.go:89] Attach volume "/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-facbf821-9ad0-4afe-a012-1ca2e2470628" to instance "k8s-node-11577350-vmss00001o" failed with disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-facbf821-9ad0-4afe-a012-1ca2e2470628) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_62), could not be attached to node(k8s-node-11577350-vmss00001o)
E0917 03:48:27.3614121nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/azure-disk//subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-facbf821-9ad0-4afe-a012-1ca2e2470628 podName: nodeName:}" failed. No retries permitted until 2020-09-1703:48:27.860236367 +0000 UTC m=+34.349704241 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume "pvc-facbf821-9ad0-4afe-a012-1ca2e2470628" (UniqueName: "kubernetes.io/azure-disk//subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-facbf821-9ad0-4afe-a012-1ca2e2470628") from node "k8s-node-11577350-vmss00001o" : disk(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-facbf821-9ad0-4afe-a012-1ca2e2470628) already attached to node(/subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/virtualMachineScaleSets/k8s-node-11577350-vmss/virtualMachines/k8s-node-11577350-vmss_62), could not be attached to node(k8s-node-11577350-vmss00001o)"
I0917 03:48:27.3657441azure_controller_common.go:120] found dangling volume /subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-2a257211-abfd-11ea-8b23-00224803698a attached to node k8s-node-11577350-vmss_62
[Yesterday 9:36 PM] Aman Kohli (NetApp)
I0917 03:48:27.2978121event.go:258] Event(v1.ObjectReference{Kind:"Pod", Namespace:"monitoring", Name:"kube-alertmanager-0", UID:"6466aa64-6656-4154-93a6-3197cd8bd9ac", APIVersion:"v1", ResourceVersion:"106026856", FieldPath:""}): type: 'Warning' reason: 'FailedAttachVolume'AttachVolume.Attach failed for volume "pvc-aab21384-2120-41bd-94fb-1368e062528b" : azure - cloud provider rate limited(read) for operation:GetDisk
I0917 03:48:27.3402521azure_controller_common.go:120] found dangling volume /subscriptions/2f495c46-73b1-463c-ae90-dae28e3880ef/resourceGroups/anf.dc.mgmt.eastus2euap.rg/providers/Microsoft.Compute/disks/k8seastus2euapdc-dynamic-pvc-38a21327-af94-11ea-8b23-00224803698a attached to node k8s-node-11577350-vmss_47
Expected behavior
The VMSS nodes should mount the volume
AKS Engine version
aks-engine v0.54.1
Kubernetes version
1.15.11
Additional context
The text was updated successfully, but these errors were encountered: