-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Issue with efa device plugin running as root #6222
Comments
Wanted to post an update that I cloned main, removed that one line, rebuilt, recreated my cluster, and it works correctly as it did before! So I am fairly certain this is a bug. |
Let me know if you'd like me to open a PR to fix this one detail - would be happy to! |
@vsoch, sure, please go ahead. We are happy to accept contributions. |
@vsoch Can you try using this
|
I can't offer testing that soon, I won't be running experiments again for a bit (they are expensive), but I could offer maybe next month. For the time being I'm just restoring the original context. |
heyo! Got a chance to try your suggestion - no go.
Works fine when I remove that block and restore to the suggested one (removing runAsNonRoot).
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Ping - I opened a PR to fix this! #6302 |
This helm chart resolved the issue https://github.com/aws-samples/efa-device-plugin-helm |
Hi - you still haven't fixed this. I just installed a fresh eksctl, and created a cluster, my pods are errored:
and the issue is:
If the |
Double checking that you mean |
Yes just a typo - fixed! |
@cPu1 and @Himangini this is still an issue and it's almost 7 months later - I've tested your suggestions and I've now opened two PRs #6302 and #6743 that fix this. The helm chart is not a solution because we are using the plugin yaml that is provided here. What is your plan to fix this in eksctl and what else can I do to help? |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Please don't close the issue stalebot - I think a resolution would either be to fix the config here or remove the efaEnabled flag (which will not work without root). |
Thank you! Really happy to see this go through. |
Hi! I opened the issue here aws-samples/aws-efa-eks#8 so they can be tracked in sync. I just updated my version of eksctl and it pulled in the new changes, and we started seeing the issue I'll report here. We are creating an EKS cluster with eksctl, specifically like this:
And when I request a job asking for efa for my pods, e.g, (this is our operator CRD that has worked before):
the pods are stuck in pending. Further inspection reveals:
And then I realized I could look at the logs of the pod that is supposed to provide the efa (which is where I found the container name / config that is provided in the manifest folder of this repo) and I saw:
I traced that to this change 943de83 that must have come with the updated eksctl. And unless there is a plan to update the container, I want to suggest you remove this added boolean. This is likely the version I used that was working before the update (and mirrors the one I found in your example repo) https://github.com/weaveworks/eksctl/blob/7ad54ae5d60d730e6d2ca8741d866f5415bab518/pkg/addons/assets/efa-device-plugin.yaml. Thanks!
The text was updated successfully, but these errors were encountered: