ebs CSI volume detach failure #431

zhoudayongdennis · 2019-12-24T06:22:36Z

/kind bug

What happened?

define stateful pod in kubernates cluster successfully, after reboot node one by one, there was pods in ContainerCreating status. e.g.
default app-ebs-1 0/1 ContainerCreating 0 21h
checking pods app-ebs-1 as below event:
Events:
Type Reason Age From Message

Warning FailedMount 35m (x122 over 21h) kubelet, ip-10-0-3-28.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[ebspvc], unattached volumes=[default-token-4fdt8 ebspvc]: timed out waiting for the condition
Warning FailedMount 3m31s (x448 over 21h) kubelet, ip-10-0-3-28.us-east-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[ebspvc], unattached volumes=[ebspvc default-token-4fdt8]: timed out waiting for the condition
Warning FailedAttachVolume 75s (x646 over 21h) attachdetach-controller AttachVolume.Attach failed for volume "pvc-fad1c767-22cf-11ea-9a1d-0661b881b6f6" : volume attachment is being deleted
3. checked attacher log as below message:
{"log":"I1224 05:49:56.456376 1 connection.go:184] GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-0678c4ebfb20d577b" from node "i-00c71bf216a632245": could not detach volume "vol-0678c4ebfb20d577b" from node "i-00c71bf216a632245": IncorrectState: Volume 'vol-0678c4ebfb20d577b'is in the 'available' state.\n","stream":"stderr","time":"2019-12-24T05:49:56.456465393Z"}

What you expected to happen?
if the volume is available, why need perform detaching operation? even if detaching, it should return success for available volume instead of current failure, right?

How to reproduce it (as minimally and precisely as possible)?
a. apply statefulset with 3, and use aws-ebs-csi storageclass
b. reboot the node

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version):
v1.16.4
Driver version:
v0.4.0

The text was updated successfully, but these errors were encountered:

leakingtapan · 2019-12-29T19:30:21Z

This should be fixed by #375

Have you tested the driver in the latest image tag?

zhoudayongdennis · 2020-01-03T02:31:09Z

just compare your change with the private change I made to pass the failure. Just want to confirm with you, for ErrNotFound return value, it will NOT be treated as failure case, right?

here was the change you made in DetachDisk function of Cloud.go:

@@ -401,6 +401,11 @@ func (c *cloud) DetachDisk(ctx context.Context, volumeID, nodeID string) error {

_, err = c.ec2.DetachVolumeWithContext(ctx, request)
if err != nil {
	**if isAWSErrorIncorrectState(err) ||
		isAWSErrorInvalidAttachmentNotFound(err) ||
		isAWSErrorVolumeNotFound(err) {
		return ErrNotFound
	}**
	return fmt.Errorf("could not detach volume %q from node %q: %v", volumeID, nodeID, err)

here was the private change I made in the same function:
_, err = c.ec2.DetachVolumeWithContext(ctx, request)
if err != nil {
if !device.IsAlreadyAssigned {
klog.Warningf("DetachDisk called on non-attached volume, ignore error: %s", volumeID)
return nil
}

	return fmt.Errorf("could not detach volume %q from node %q: %v", volumeID, nodeID, err)
}

zhoudayongdennis · 2020-01-03T02:38:59Z

My project checkout the branch 0.4.0 instead of master branch.

what's the difference between 0.4.0 and master? Should i use the master branch for image build?

zhoudayongdennis · 2020-01-03T06:07:16Z

any schedule for new branch release?

leakingtapan · 2020-01-03T07:22:13Z

what's the difference between 0.4.0 and master?

Here is a list of changes: v0.4.0...master

Should i use the master branch for image build?

Are you using it for testing purpose or production use? If production use, I would recommend wait for the v0.5.0 release

zhoudayongdennis · 2020-01-03T07:30:59Z

ok, I will wait for 0.5.0. Do you have the schedule for it?

leakingtapan · 2020-01-03T07:40:24Z

Just want to confirm with you, for ErrNotFound return value, it will NOT be treated as failure case, right?

Yep. With the change, the driver will return success when detaching a NotFound volume. Could you test the container image with latest tag and see if this fixes your issue?

SimonDreher · 2020-02-05T13:43:12Z

I am also interested, if there is a planned time for the release of 0.5.0?

This is blocking us from migrating to kubernetes v1.15, since there we need v0.4.0 and with this bug all deployments with persistent volumes will break (until manual fix) every time their node is dying ...

If 0.5.0 release still takes time, is there the possibility to cherry-pick the fix for this and release a 0.4.1 release?

fejta-bot · 2020-05-05T14:16:26Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

leakingtapan · 2020-05-07T06:11:41Z

/close

as v0.5.0 is released

k8s-ci-robot · 2020-05-07T06:11:46Z

@leakingtapan: Closing this issue.

In response to this:

/close

as v0.5.0 is released

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 24, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 5, 2020

k8s-ci-robot closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ebs CSI volume detach failure #431

ebs CSI volume detach failure #431

zhoudayongdennis commented Dec 24, 2019

leakingtapan commented Dec 29, 2019 •

edited

Loading

zhoudayongdennis commented Jan 3, 2020 •

edited

Loading

zhoudayongdennis commented Jan 3, 2020 •

edited

Loading

zhoudayongdennis commented Jan 3, 2020

leakingtapan commented Jan 3, 2020

zhoudayongdennis commented Jan 3, 2020

leakingtapan commented Jan 3, 2020 •

edited

Loading

SimonDreher commented Feb 5, 2020

fejta-bot commented May 5, 2020

leakingtapan commented May 7, 2020

k8s-ci-robot commented May 7, 2020

ebs CSI volume detach failure #431

ebs CSI volume detach failure #431

Comments

zhoudayongdennis commented Dec 24, 2019

leakingtapan commented Dec 29, 2019 • edited Loading

zhoudayongdennis commented Jan 3, 2020 • edited Loading

zhoudayongdennis commented Jan 3, 2020 • edited Loading

zhoudayongdennis commented Jan 3, 2020

leakingtapan commented Jan 3, 2020

zhoudayongdennis commented Jan 3, 2020

leakingtapan commented Jan 3, 2020 • edited Loading

SimonDreher commented Feb 5, 2020

fejta-bot commented May 5, 2020

leakingtapan commented May 7, 2020

k8s-ci-robot commented May 7, 2020

leakingtapan commented Dec 29, 2019 •

edited

Loading

zhoudayongdennis commented Jan 3, 2020 •

edited

Loading

zhoudayongdennis commented Jan 3, 2020 •

edited

Loading

leakingtapan commented Jan 3, 2020 •

edited

Loading