-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any upgrade guilde to 1.2 #391
Comments
@jerry153fish: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'd like to know the answer to this question as well. I've tried upgrading twice and have had to roll back to 1.1 both times because pods have trouble both mounting and releasing the existing EFS PVCs. Roll back fixes the problems immediately. I don't know what I'm missing. I have added the serviceaccount, and have set up the correct IAM permissions as far as I can tell, assigned to the serviceaccount. I have many other deployments in my cluster using IRSA permissions, so I know that works for other deployments. |
It should just be a matter of running helm upgrade, we are indeed lacking doc on this though. @johnjeffers Regarding the issue with mounts hanging: If you are able to, could you test the the Details: The only major change in 1.2 that would affect all mounts is the bump of the efs-utils dependency https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/CHANGELOG-1.x.md. I actually ended up rolling back the efs-utils dependency this change in master branch as I was in process of debugging some CI flakiness with symptoms similar to what you are reporting. If you can confirm for me that the issue is NOT present in the master branch, I will release a 1.2.1 version of the driver that has the changes from 1.2 MINUS the efs-utils dependency change. |
@wongma7 Do you want me to try this with the 1.1.2 version of the Helm chart that I'm pinned to right now, or with the latest helm chart version? Because my current helm chart version won't create the controller deployment, only the daemonset. |
@johnjeffers the chart you are pinned to right now, yes. The issue must be in the daemonset and I'm trying to control for efs-utils version, keeping all else including the chart equal. If even master doesn't work, one other tag to try is master has efs-utils v1.30.1 I really appreciate you testing this out, have not had the chance to reproduce this issue. |
Here's what I'm seeing: I deploy the Then, I attempt to delete a pod that mounts an EFS PVC. The pod gets stuck in After 10 minutes or so, I force delete the pod that's stuck in Subsequent deletes of the pods appear to behave normally. It's only the first delete, after the daemonset is updated, where I see the deleted pod get stuck in |
OK thank you, I'm preparing a v1.2.1 release with efs-utils downgraded to v1.28.1 #429 because if master doesn't work then it means efs-utils v1.30.1 doesn't fix the issue either .
I assume you mean the replacement pods that get spawned by the deployment rollout. This aligns with my basic understanding of what is happening. efs-utils takes care of maintaining the state of mounts. So it seems like for whatever reason, sometimes volumes originally mounted by |
Yes. For example, I have a Grafana deployment that uses an EFS PVC. After I deployed the CSI driver with |
Hi @wongma7 can we add the Kubernetes manifest docs as well ? thanks very much. |
They are here: https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/deploy/kubernetes |
@wongma7 I have some more info about the problems I'm seeing with the new version (this is using
It says it's
Here are the latest events:
Rolling back to v1.1.1 fixes the problem immediately. As soon as the daemonset pods are replaced, Grafana comes back up in seconds. |
@wongma7 what's the status on this? Did 1.2.1 get rolled out with the downgraded efs-utils? |
@johnjeffers yes, sorry I forgot to leave an update here!! helm chart 1.2.4 contains 1.2.1 We (@kbasv ) also managed to narrow down the issue and, it should be fixed in the latest version of efs-utils 1.31.1. But we won't be releasing that for a while, of course we'll regression test it. |
@wongma7 - following up on the efs-utils version, I noticed that the version used in the new release 1.30 is efs-utils 1.30.2-1. Does this mean that the efs-utils issue was fixed prior to 1.31.1? |
@esalberg the new release |
it looks like 1.3.1 reintroduced the bad behavior. When I rolled out 1.3.1, pods that have EFS volumes attached started failing. Rolling back to 1.2.1 restored things. |
I couldn't reproduce, upgrading from driver 1.2.1 to 1.3.1 worked for me (i.e. my Pod could continue to read and write before/after upgrade). I was using the dynamic provisioning example https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/dynamic_provisioning. I did helm upgrade --install but you can also in-place upgrade of just one specific node plugin for the purposes of debugging (i.e. Please capture logs from the 1.3.1 efs-plugin while volumes appear to be stuck. For reference, here is what my 1.3.1 efs plugin node pod logs after the upgrade, it successfully "resumes" the mount/tls tunnel.
You can also try this script on the efs plugin node pod. https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/troubleshooting |
@wongma7 That was a false alarm. I had some unrelated symptoms that looked very similar to the previous problem, and I jumped to the wrong conclusion. Thank you for the quick reply, and my apologies! |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Hello all,
Are there any docs regarding upgrading to 1.2 ? Or it will just magically upgrade to 1.2 as long as we set up the service account.
The text was updated successfully, but these errors were encountered: