-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume stuck in attaching state when using multiple PersistentVolumeClaim #36450
Comments
I am experiencing this too on GKE 1.4.5 The pod mounts the PV initially then after some time, ostensibly after the pod is moved to a new node, the pod can't remount the PV as it's stuck on the last node. |
@willis7 what do you see when you do |
@gnufied I dont still have this available as I took another approach, but there was no errors beyond what I shared above. |
We get this frequently with EBS PVC/PV voumes and see plenty of similar reports. It usually starts when recreating a Pod ("Recreate" Strategy). The old Pod is torn down and the PV unmounted, then the PV is mounted on the new Pod (on same or different worker). It seems like quick unmount/mount can trigger the 'stuck attaching' issue, which AWS blames on reusing device names (or reusing them too quickly maybe): A temporary fix is to tell AWS for force detach the EBS volume, then wait, the new Pod will attach and recover with a few minutes. However next time you recreate that particular Pod you also most certainly get the same stuck problem; Once an instance+PV combo start doing this it seems to happen almost every time. The only long term fix I have found/seen is to reboot the worker node or to delete and recreate the PVC/PV. It is a major hassle and we're looking to switch away from EBS to something more reliable for mounting, like EFS, NFS or GlusterFS. I wondered about scaling the deployment to 0 instances first, waiting a while before redeploying. Not an attractive option though. |
Thanks @gnufied
Sorry for the crappy experience! What version of kubernetes are you running? I know @justinsb @jingxu97 have worked on a number of fixes to improve the AWS EBS experience. A big fix, #34859, went in to 1.4.6 and there are already fixes pending for 1.4.7: #36840 CC @kubernetes/sig-storage |
@willis7 |
@saad-ali upgraded to 1.4.6 today, i'll keep an eye on it... |
@eggie5 Thanks! |
@saad-ali no need to apologize, even before k8s the 'stuck attaching' was a known EBS condition (hence the AWS FAQ). It was just that before k8s it came up less often because it was much less common to be unmounting and remounting EBS volumes between instances every few minutes when a k8s CD deployment happens :-) Thanks for the tip about the upcoming patches by @justinsb and @jingxu97. We create clusters using coreos kube-aws and latest release is 1.4.3 and master is 1.4.6 I think. Might test a 1.4.6 cluster if I can. I see AWS EFS or similar as a more natural fit for smallish disk volumes for k8s anyway,
Unfortunately EFS is taking its own sweet time to get to the southern hemisphere, like Java committee process slow :-P |
Hey gang, @whereisaaron has summed up my scenario perfectly in his first post. I shall fire up another cluster, and see if this is resolved with the latest patches. Many thanks! |
I tried reproducing this with latest version and I think situation has defenitely improved. Here is my deployment file: apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx2
spec:
replicas: 1
strategy:
type: "Recreate"
template:
metadata:
labels:
run: nginx2
spec:
containers:
- name: nginx2
image: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/opt1"
name: pvol1
- mountPath: "/opt2"
name: pvol2
- mountPath: "/opt3"
name: pvol3
- mountPath: "/opt4"
name: pvol4
volumes:
- name: pvol1
persistentVolumeClaim:
claimName: "gnufied-vol1"
- name: pvol2
persistentVolumeClaim:
claimName: "gnufied-vol2"
- name: pvol3
persistentVolumeClaim:
claimName: "gnufied-vol3"
- name: pvol4
persistentVolumeClaim:
claimName: "gnufied-vol4" and I bumped nginx image verison so as to trigger new deployment. I couldn't reproduce it. So I think situation here has definitely improved in latest version. If those - who are still seeing this problem can attach |
So, can we use EFS as the I see in the docs that it mentions not supporting nodes in different availability zones. My nodes are in the same region, but different availability zones. I'm guessing I can't use |
Unfortunately, I don't think the situation has improved. After a pod gets rescheduled its volume detaches correctly but then gets stuck in attaching state. I have checked with AWS support and got a response:
The issue seems to be that k8s is not waiting for the unmount to fully finish before issuing a detach command to AWS. So the device name is not released yet. It also looks like duplicate of #31891 |
Issues go stale after 30d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
It looks like stuck volumes are just a intrinsic, unavoidable risk when using EBS to mount/unmount on a running instance, but in addition to rotating device names, there is further mitigation in place for k8s 1.9:
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I'm using Kube 1.4.5, and AWS storage.
When I try to attach multiple volumes using PVC's, one of the volumes consistently gets stuck in the attaching state whilst the other is successful.
Below are the definitions that I used.
sonar-persistence.yml
sonar-claim.yml
sonar-deployment.yml
The data volume always appears to be successful, and maybe coincidentally is first in the list. I have tried this multiple times but the result is always the same.
The error message is as follows:
The text was updated successfully, but these errors were encountered: