Volume stuck in attaching state when using multiple PersistentVolumeClaim #36450

willis7 · 2016-11-08T17:10:11Z

I'm using Kube 1.4.5, and AWS storage.

When I try to attach multiple volumes using PVC's, one of the volumes consistently gets stuck in the attaching state whilst the other is successful.

Below are the definitions that I used.

sonar-persistence.yml

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: sonarqube-data
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  awsElasticBlockStore:
    volumeID: aws://eu-west-1a/vol-XXXXXXX
    fsType: ext4

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: sonarqube-extensions
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  awsElasticBlockStore:
    volumeID: aws://eu-west-1a/vol-XXXXXX
    fsType: ext4

sonar-claim.yml

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: sonarqube-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: sonarqube-extensions
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

sonar-deployment.yml

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: sonar
spec:
  replicas: 1
  template:
    metadata:
      name: sonar
      labels:
        name: sonar
    spec:
      containers:
        - image: sonarqube:lts
          args:
            - -Dsonar.web.context=/sonar
          name: sonar
          env:
            - name: SONARQUBE_JDBC_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-pwd
                  key: password
            - name: SONARQUBE_JDBC_URL
              value: jdbc:postgresql://sonar:5432/sonar
          ports:
            - containerPort: 9000
              name: sonar
          volumeMounts:
          - name: sonarqube-data
            mountPath: /opt/sonarqube/data
          - name: sonarqube-extensions
            mountPath: /opt/sonarqube/extensions
      volumes:
        - name: sonarqube-data
          persistentVolumeClaim:
            claimName: sonarqube-data
        - name: sonarqube-extensions
          persistentVolumeClaim:
            claimName: sonarqube-extensions

The data volume always appears to be successful, and maybe coincidentally is first in the list. I have tried this multiple times but the result is always the same.

The error message is as follows:

Unable to mount volumes for pod "sonar-3504269494-tnzwo_default(2cc5292c-a5d4-11e6-bd99-0a82a8a86ebf)": timeout expired waiting for volumes to attach/mount for pod "sonar-3504269494-tnzwo"/"default". list of unattached/unmounted volumes=[sonarqube-data sonarqube-extensions]
Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "sonar-3504269494-tnzwo"/"default". list of unattached/unmounted volumes=[sonarqube-data sonarqube-extensions]

The text was updated successfully, but these errors were encountered:

eggie5 · 2016-11-09T21:35:30Z

I am experiencing this too on GKE 1.4.5

The pod mounts the PV initially then after some time, ostensibly after the pod is moved to a new node, the pod can't remount the PV as it's stuck on the last node.

gnufied · 2016-11-16T04:01:45Z

@willis7 what do you see when you do kubectl describe pod sonar-3504269494-tnzwo_default . Do you see any particular errors in the YAML?

willis7 · 2016-11-16T08:41:23Z

@gnufied I dont still have this available as I took another approach, but there was no errors beyond what I shared above.

gnufied · 2016-11-18T22:27:26Z

@justinsb or @saad-ali I will take a stab at this. Do assign this to me, if it is not a problem.

whereisaaron · 2016-11-23T00:13:05Z

We get this frequently with EBS PVC/PV voumes and see plenty of similar reports. It usually starts when recreating a Pod ("Recreate" Strategy). The old Pod is torn down and the PV unmounted, then the PV is mounted on the new Pod (on same or different worker). It seems like quick unmount/mount can trigger the 'stuck attaching' issue, which AWS blames on reusing device names (or reusing them too quickly maybe):
https://aws.amazon.com/premiumsupport/knowledge-center/ebs-stuck-attaching/

A temporary fix is to tell AWS for force detach the EBS volume, then wait, the new Pod will attach and recover with a few minutes. However next time you recreate that particular Pod you also most certainly get the same stuck problem; Once an instance+PV combo start doing this it seems to happen almost every time. The only long term fix I have found/seen is to reboot the worker node or to delete and recreate the PVC/PV.

It is a major hassle and we're looking to switch away from EBS to something more reliable for mounting, like EFS, NFS or GlusterFS.

I wondered about scaling the deployment to 0 instances first, waiting a while before redeploying. Not an attractive option though.

saad-ali · 2016-11-23T01:46:08Z

@justinsb or @saad-ali I will take a stab at this. Do assign this to me, if it is not a problem.

Thanks @gnufied

We get this frequently with EBS PVC/PV voumes and see plenty of similar reports.

Sorry for the crappy experience! What version of kubernetes are you running?

I know @justinsb @jingxu97 have worked on a number of fixes to improve the AWS EBS experience. A big fix, #34859, went in to 1.4.6 and there are already fixes pending for 1.4.7: #36840

CC @kubernetes/sig-storage

rootfs · 2016-11-23T01:56:31Z

@willis7
Can you provide kubectl describe pvc output and is it possible to share your controller log and kubelet log?

saad-ali · 2016-11-23T01:58:58Z

@willis7 and @eggie5 Could you also try 1.4.6+ if you get a chance and see if you get a repro there.

eggie5 · 2016-11-23T02:07:32Z

@saad-ali upgraded to 1.4.6 today, i'll keep an eye on it...

saad-ali · 2016-11-23T02:09:37Z

@eggie5 Thanks!

whereisaaron · 2016-11-23T03:33:28Z

@saad-ali no need to apologize, even before k8s the 'stuck attaching' was a known EBS condition (hence the AWS FAQ). It was just that before k8s it came up less often because it was much less common to be unmounting and remounting EBS volumes between instances every few minutes when a k8s CD deployment happens :-)

Thanks for the tip about the upcoming patches by @justinsb and @jingxu97. We create clusters using coreos kube-aws and latest release is 1.4.3 and master is 1.4.6 I think. Might test a 1.4.6 cluster if I can.

I see AWS EFS or similar as a more natural fit for smallish disk volumes for k8s anyway,

no need to decide/estimate volume sizes ahead of time or resize later
you can multiply mount, allowing more options for rolling deployments.

Unfortunately EFS is taking its own sweet time to get to the southern hemisphere, like Java committee process slow :-P

willis7 · 2016-11-23T08:52:09Z

Hey gang, @whereisaaron has summed up my scenario perfectly in his first post. I shall fire up another cluster, and see if this is resolved with the latest patches. Many thanks!

gnufied · 2016-11-24T04:58:22Z

I tried reproducing this with latest version and I think situation has defenitely improved. Here is my deployment file:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx2
spec:
  replicas: 1
  strategy:
    type: "Recreate"
  template:
    metadata:
      labels:
        run: nginx2
    spec:
      containers:
      - name: nginx2
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: "/opt1"
          name: pvol1
        - mountPath: "/opt2"
          name: pvol2
        - mountPath: "/opt3"
          name: pvol3
        - mountPath: "/opt4"
          name: pvol4
      volumes:
      - name: pvol1
        persistentVolumeClaim:
          claimName: "gnufied-vol1"
      - name: pvol2
        persistentVolumeClaim:
          claimName: "gnufied-vol2"
      - name: pvol3
        persistentVolumeClaim:
          claimName: "gnufied-vol3"
      - name: pvol4
        persistentVolumeClaim:
          claimName: "gnufied-vol4"

and I bumped nginx image verison so as to trigger new deployment. I couldn't reproduce it. So I think situation here has definitely improved in latest version.

If those - who are still seeing this problem can attach kubelet.log and kube-controller-manager.log that would be pretty helpful making this area robust.

craigwillis85 · 2016-11-24T16:35:15Z

So, can we use EFS as the awsElasticBlockStore?

I see in the docs that it mentions not supporting nodes in different availability zones. My nodes are in the same region, but different availability zones.

I'm guessing I can't use awsElasticBlockStore in this case? Or can I?

pajel · 2016-12-14T03:53:04Z

Unfortunately, I don't think the situation has improved.
Our setup:
Ubuntu 16.04.1, kernel 4.4.0-47-generic
K8s: 1.4.6

After a pod gets rescheduled its volume detaches correctly but then gets stuck in attaching state. I have checked with AWS support and got a response:

Unfortunately, the issue is on the underlying host side and not an Ubuntu problem. Restarting your instance causes the relevant information on our side to reset, so that makes the device available for use again. You can also achieve the same result by stopping the instance and starting it again, which moves your instance to a new underlying host. Without a restart, you can work around the issue by choosing a different device, or avoid the problem by making sure the volume is fully unmounted and no longer in use before detaching, but that's about it I'm afraid. I am sorry for any inconvenience.

The issue seems to be that k8s is not waiting for the unmount to fully finish before issuing a detach command to AWS. So the device name is not released yet.

It also looks like duplicate of #31891

jingxu97 · 2016-12-14T19:40:38Z

@pajel could you please check whether you have the same issue as #37662. That problem is fix in release 1.4.7. If you think yours is different, please let me know more details about your issue and share the log with us. Thanks!

pajel · 2016-12-15T21:16:04Z

@jingxu97 thanks for your reply. #37662 seems to be different as their EBS volume is attached, but not picked up by k8s. While our case is the EBS volume is stuck in attaching state.
However #31891 seems like the exact same issue, even the logs are the same. I'll follow up there, thank you.

fejta-bot · 2017-12-19T10:26:38Z

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

whereisaaron · 2017-12-19T16:55:24Z

It looks like stuck volumes are just a intrinsic, unavoidable risk when using EBS to mount/unmount on a running instance, but in addition to rotating device names, there is further mitigation in place for k8s 1.9:

In v1.9 SIG AWS has improved stability of EBS support across the board. If a Volume is “stuck” in the attaching state to a node for too long a unschedulable taint will be applied to the node, so a Kubernetes admin can take manual steps to correct the error. Users are encouraged to ensure they are monitoring for the taint, and should consider automatically terminating instances in this state.

fejta-bot · 2018-01-18T17:20:28Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-17T17:27:03Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-github-robot added area/kubectl team/cluster labels Nov 8, 2016

justinsb added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed area/kubectl labels Nov 15, 2016

whereisaaron mentioned this issue Nov 23, 2016

Pet Set stuck in ContainerCreating #28709

Closed

leosunmo mentioned this issue Dec 14, 2016

EBS volumes remains in 'attaching' state when attached with a previously used device name #31891

Closed

justinsb added the area/platform/aws label Dec 16, 2016

jingxu97 mentioned this issue Dec 16, 2016

Petsets issue with volumes on AWS #37824

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2017

k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 18, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 18, 2018

k8s-ci-robot closed this as completed Feb 17, 2018

rpagliuca mentioned this issue Sep 22, 2020

DaemonSet doesn't run in all nodes #23013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume stuck in attaching state when using multiple PersistentVolumeClaim #36450

Volume stuck in attaching state when using multiple PersistentVolumeClaim #36450

willis7 commented Nov 8, 2016

eggie5 commented Nov 9, 2016

gnufied commented Nov 16, 2016 •

edited

Loading

willis7 commented Nov 16, 2016

gnufied commented Nov 18, 2016

whereisaaron commented Nov 23, 2016 •

edited

Loading

saad-ali commented Nov 23, 2016

rootfs commented Nov 23, 2016

saad-ali commented Nov 23, 2016

eggie5 commented Nov 23, 2016

saad-ali commented Nov 23, 2016

whereisaaron commented Nov 23, 2016

willis7 commented Nov 23, 2016

gnufied commented Nov 24, 2016

craigwillis85 commented Nov 24, 2016

pajel commented Dec 14, 2016

jingxu97 commented Dec 14, 2016

pajel commented Dec 15, 2016

fejta-bot commented Dec 19, 2017

whereisaaron commented Dec 19, 2017

fejta-bot commented Jan 18, 2018

fejta-bot commented Feb 17, 2018

Volume stuck in attaching state when using multiple PersistentVolumeClaim #36450

Volume stuck in attaching state when using multiple PersistentVolumeClaim #36450

Comments

willis7 commented Nov 8, 2016

eggie5 commented Nov 9, 2016

gnufied commented Nov 16, 2016 • edited Loading

willis7 commented Nov 16, 2016

gnufied commented Nov 18, 2016

whereisaaron commented Nov 23, 2016 • edited Loading

saad-ali commented Nov 23, 2016

rootfs commented Nov 23, 2016

saad-ali commented Nov 23, 2016

eggie5 commented Nov 23, 2016

saad-ali commented Nov 23, 2016

whereisaaron commented Nov 23, 2016

willis7 commented Nov 23, 2016

gnufied commented Nov 24, 2016

craigwillis85 commented Nov 24, 2016

pajel commented Dec 14, 2016

jingxu97 commented Dec 14, 2016

pajel commented Dec 15, 2016

fejta-bot commented Dec 19, 2017

whereisaaron commented Dec 19, 2017

fejta-bot commented Jan 18, 2018

fejta-bot commented Feb 17, 2018

gnufied commented Nov 16, 2016 •

edited

Loading

whereisaaron commented Nov 23, 2016 •

edited

Loading