Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing RBAC permissions #101

Merged
merged 1 commit into from
Oct 26, 2023
Merged

Conversation

jcanocan
Copy link
Contributor

When FAR is used with fence_azure_arm fence agent. It requires permissions for watching and listing the namespace objects for repairing unhealthy nodes. If the operator does not hold these permissions, the following error is shown and NHC stays in Remediating status :

2023-10-25T09:36:47.33411147Z	INFO	controllers.FenceAgentsRemediation	Manual workload deletion	{"Fence Agent": "fence_azure_arm", "Node Name": "jcano-cluster-9fnfq-worker-germanywestcentral1-m5nwt"}
W1025 09:36:47.336315       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:openshift-operators:fence-agents-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope
E1025 09:36:47.336350       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:openshift-operators:fence-agents-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope

It adds verbs list and watch verbs for namespace resources.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 25, 2023

Hi @jcanocan. Thanks for your PR.

I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@razo7
Copy link
Member

razo7 commented Oct 25, 2023

Sounds reasonable that these RBAC permissions should be added (as SNR needed them as well), but I believe the get variable should be added as well.
See old discussion for a similar idea on the notion of using get;list;watch rather than just list;watch.

Thank you for catching that and for participating.

  1. Can you use get as well or try to verify the error for different combinations?
  2. Please also run make bundle in order to re-generate the CSV with those changes.

@jcanocan
Copy link
Contributor Author

Sounds reasonable that these RBAC permissions should be added (as SNR needed them as well), but I believe the get variable should be added as well. See old discussion for a similar idea on the notion of using get;list;watch rather than just list;watch.

Thank you for catching that and for participating.

Glad to contribute with the project. It is very interesting and promising :)

1. Can you use `get` as well or try to verify the error for [different combinations](https://github.com/medik8s/fence-agents-remediation/pull/48#discussion_r1191054622)?

Sure! Here are the results:

  1. With just get:
2023-10-25T15:01:18.293589335Z	INFO	controllers.FenceAgentsRemediation	Manual workload deletion	{"Fence Agent": "fence_azure_arm", "Node Name": "jcano-cluster-9fnfq-worker-germanywestcentral3-nmbx8"}
W1025 15:01:18.295546       1 reflector.go:535] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:openshift-operators:fence-agents-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope
E1025 15:01:18.295576       1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:openshift-operators:fence-agents-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope
  1. get and watch:
2023-10-25T15:16:05.300798484Z	INFO	controllers.FenceAgentsRemediation	Manual workload deletion	{"Fence Agent": "fence_azure_arm", "Node Name": "jcano-cluster-9fnfq-worker-germanywestcentral3-nmbx8"}
W1025 15:16:05.302447       1 reflector.go:535] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:openshift-operators:fence-agents-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope
E1025 15:16:05.302484       1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:openshift-operators:fence-agents-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope

It's a bit strange that it complains about watch besides the fact that it has been added...
3. get and list:

E1025 15:21:57.247278       1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: Failed to watch *v1.Namespace: unknown (get namespaces)

However, just using get and list FAR is able to reconcile and finish the job.
4. list gets same result as 3).

In conclusion, it looks like the only verb really required is list. Nevertheless, it throws an error but end the execution successfully. So, should we also add the watch verb then? WDYT?

2. Please also run `make bundle` in order to re-generate the CSV with those changes.

I will when we have a decision about if we want to add just get and list or get, list and watch. Thanks for letting me know.

@razo7
Copy link
Member

razo7 commented Oct 26, 2023

Sure! Here are the results:

That was fast :)

In conclusion, it looks like the only verb really required is list. Nevertheless, it throws an error but end the execution successfully.

For option 0 (list;watch), option 3 (get;list), and option 4(list) the reconciliation was executed successfully but with a warning. How is it for option 5 (get;list;watch)? I believe it would finish without a warning

@jcanocan
Copy link
Contributor Author

jcanocan commented Oct 26, 2023

That was fast :)

Thanks :)

In conclusion, it looks like the only verb really required is list. Nevertheless, it throws an error but end the execution successfully.

For option 0 (list;watch), option 3 (get;list), and option 4(list) the reconciliation was executed successfully but with a warning. How is it for option 5 (get;list;watch)? I believe it would finish without a warning

Yes, it finishes without any warning.

@razo7
Copy link
Member

razo7 commented Oct 26, 2023

Then, let's continue with option 5

@jcanocan
Copy link
Contributor Author

jcanocan commented Oct 26, 2023

v2:

  • added get verb
  • ran make bundle to re-generate CSV

The verbs `list` and `watch` for namespace objects are required
for recovering unhealthy nodes. Otherwise, the remediation tasks do not
finish successfully.

Signed-off-by: Javier Cano Cano <[email protected]>
@razo7
Copy link
Member

razo7 commented Oct 26, 2023

/ok-to-test

@razo7
Copy link
Member

razo7 commented Oct 26, 2023

Looks good, thanks again for submitting the PR
/approve
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcanocan, razo7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot merged commit 07eb90b into medik8s:main Oct 26, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants