Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RBAC permissions for namespaces. #93

Merged
merged 1 commit into from
Feb 21, 2023

Conversation

epenchev
Copy link
Contributor

@epenchev epenchev commented Feb 17, 2023

This fixes errors like the one reported in the logs.

E0203 12:05:27.009380 2699 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:self-node-remediation:self-node-remediation-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope 
E0203 13:00:27.442104 2699 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces) 

Relevant Jira ticket - ECOPROJECT-1183

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 17, 2023

Hi @epenchev. Thanks for your PR.

I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mshitrit
Copy link
Member

Hi @epenchev thanks for the PR and nice catch !

However I suggest that:

  • before adding the RBAC permission let's find out where are we monitoring the namespace - it could be that accessing the namespace isn't needed in which case a better solution would be to remove it.
  • in case we decide those permissions are indeed required I prefer adding it with a kubebuilder annotation (here is an example)

@epenchev
Copy link
Contributor Author

However I suggest that:

* before adding the RBAC permission let's find out where are we monitoring the namespace - it could be that accessing the namespace isn't needed in which case a better solution would be to remove it.

Hi, thanks for the review.
It all starts with the listing of the namespaces here.
This is the point where controller-runtime uses k8s client-go library for listing the namespaces.
Controller-runtime also utilizes client-go caching capabilities which has it's own monitor for watching resources here.

And this is what happens when you remove only the watch permission.

2023-02-20T08:15:48.094Z INFO peerhealth.server peer health server started
2023-02-20T09:15:28.388Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:15:28.500Z INFO controllers.SelfNodeRemediation SNR already deleted {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:15:29.501Z INFO controllers.SelfNodeRemediation SNR already deleted {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:16:00.252Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:16:00.268Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:16:00.316Z INFO controllers.SelfNodeRemediation NoExecute taint added {"selfnoderemediation": "self-node-remediation/k8s-node6", "new taints": [{"key":"medik8s.io/remediation","value":"self-node-remediation","effect":"NoExecute","timeAdded":"2023-02-20T09:16:00Z"}]}
2023-02-20T09:16:00.316Z INFO controllers.SelfNodeRemediation Marking node as unschedulable {"selfnoderemediation": "self-node-remediation/k8s-node6", "node name": "k8s-node6"}
2023-02-20T09:16:00.359Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:16:00.359Z INFO controllers.SelfNodeRemediation waiting for unschedulable taint to appear {"selfnoderemediation": "self-node-remediation/k8s-node6", "node name": "k8s-node6"}
2023-02-20T09:16:01.269Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:16:01.269Z INFO controllers.SelfNodeRemediation updating snr with node backup and updating time to assume node has been rebooted {"selfnoderemediation": "self-node-remediation/k8s-node6", "node name": "k8s-node6"}
2023-02-20T09:16:01.291Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:16:01.297Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:19:07.000Z INFO controllers.SelfNodeRemediation fencing not completed yet, continuing remediation {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:19:07.000Z INFO controllers.SelfNodeRemediation TimeAssumedRebooted is old. The unhealthy node assumed to been rebooted {"selfnoderemediation": "self-node-remediation/k8s-node6", "node name": "k8s-node6"}
E0220 09:19:07.003878 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
2023-02-20T09:19:07.378Z INFO controllers.SelfNodeRemediation fencing completed, cleaning up {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:19:08.390Z INFO controllers.SelfNodeRemediation fencing completed, cleaning up {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:19:08.404Z INFO controllers.SelfNodeRemediation NoExecute taint removed {"selfnoderemediation": "self-node-remediation/k8s-node6", "new taints": null}
2023-02-20T09:19:08.430Z INFO controllers.SelfNodeRemediation finalizer removed {"selfnoderemediation": "self-node-remediation/k8s-node6"}
2023-02-20T09:19:08.430Z INFO controllers.SelfNodeRemediation SNR already deleted {"selfnoderemediation": "self-node-remediation/k8s-node6"}
E0220 09:19:08.449680 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
E0220 09:19:10.584331 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
E0220 09:19:14.889163 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
E0220 09:19:22.963467 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
E0220 09:19:43.873974 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
E0220 09:20:34.708322 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces)
E0220 09:21:19.440353 2640 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Namespace: unknown (get namespaces) 
* in case we decide those permissions are indeed required I prefer adding it with a kubebuilder annotation ([here](https://github.com/medik8s/self-node-remediation/blob/main/controllers/selfnoderemediation_controller.go#L129) is an example)

Yes that's a good point, will correct the PR.

@epenchev epenchev force-pushed the rbac-namespaces-fix branch 2 times, most recently from 52cd6c4 to 95ce5fd Compare February 20, 2023 12:45
@mshitrit
Copy link
Member

Thanks !
Please also run make bundle in order to re-generate the csv with those changes.

@slintes
Copy link
Member

slintes commented Feb 21, 2023

Hi @epenchev, we are trying to understand why we didn't run into this issue in our tests.

It seems that at least when being installed with Operator Lifecycle Manager (OLM) on Openshift, an additional ClusterRole with rules for reading namespaces (and some other resources) is created automatically. Still figuring out details on why this is, and who is doing it.

Just curious, on what kind of cluster are you running, and how do install NHC / SNR?

@epenchev
Copy link
Contributor Author

epenchev commented Feb 21, 2023

It seems that at least when being installed with Operator Lifecycle Manager (OLM) on Openshift, an additional ClusterRole with rules for reading namespaces (and some other resources) is created automatically. Still figuring out details on why this is, and who is doing it.

Just curious, on what kind of cluster are you running, and how do install NHC / SNR?

Hi thanks for looking into that issue.
I don't run the SNR operator on Openshift cluster and the OLM operator is not available for me.
I'm running the SNR operator on a K3s cluster and using make install for kustomize to build the resulting yaml configuration.

@epenchev
Copy link
Contributor Author

Thanks ! Please also run make bundle in order to re-generate the csv with those changes.

Thanks I've corrected the PR.

@slintes
Copy link
Member

slintes commented Feb 21, 2023

It seems that at least when being installed with Operator Lifecycle Manager (OLM) on Openshift, an additional ClusterRole with rules for reading namespaces (and some other resources) is created automatically. Still figuring out details on why this is, and who is doing it.
Just curious, on what kind of cluster are you running, and how do install NHC / SNR?

Hi thanks for looking into that issue. I don't run the SNR operator on Openshift cluster and the OLM operator is not available for me. I'm running the SNR operator on a K3s cluster and using make install for kustomize to build the resulting yaml configuration.

Ok, thanks for confirming you're not using OLM / not on Openshift, that at least explains the difference 🙂

Just fyi:

  • We recommend to use OLM even on kubernetes. It's easy to install, just follow the instructions when clicking on "Install" at https://operatorhub.io/operator/node-healthcheck-operator (we need to update NHC to the latest version, but SNR is up to date).
  • When you don't use OLM, you will probably run into issues with the validating webhooks of NHC and SNR, because missing certificates.
  • Alternatively you can use cert-manager for the certificates, not sure though if the code is working correctly then, it might look at the wrong cert location, at least with the default cert-manger config.

@slintes
Copy link
Member

slintes commented Feb 21, 2023

/ok-to-test
/lgtm
/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 21, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: epenchev, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@slintes
Copy link
Member

slintes commented Feb 21, 2023

overriding flaky test lanes, the 4.13 e2e lane passed...

/override ci/prow/4.11-openshift-e2e
/override ci/prow/4.12-openshift-e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 21, 2023

@slintes: Overrode contexts on behalf of slintes: ci/prow/4.11-openshift-e2e, ci/prow/4.12-openshift-e2e

In response to this:

overriding flaky test lanes, the 4.13 e2e lane passed...

/override ci/prow/4.11-openshift-e2e
/override ci/prow/4.12-openshift-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants