Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Non-Employee RBAC Model #1335

Closed
32 tasks done
Collinbrown95 opened this issue Sep 5, 2022 · 3 comments
Closed
32 tasks done

Refactor Non-Employee RBAC Model #1335

Collinbrown95 opened this issue Sep 5, 2022 · 3 comments
Assignees
Labels
kind/feature New feature or request

Comments

@Collinbrown95
Copy link
Contributor

Collinbrown95 commented Sep 5, 2022

Overview

Given new requirements around "deemed" employees who have access to some (but not all) employee-only features, we need a more fine-grained model of RBAC than the current set of labels provided by the aaw-profiles-state-controller.

Proposed Solution

Instead of classifying namespaces as containing non-employees or being employee only, what if namespaces received a series of labels corresponding to capabilities. In this way, exception lists could be specified for each capability, and namespaces could receive labels based on the intersection of its members' capability lists (i.e. the namespace receives the least permissive combination of individual members' capabilities).

Observations

  1. The new labels provide a superset of the current users who have SAS access, so rolling out the new labels to prod should not lead to an inconsistent state.

Caveats

  1. Access control is managed manually through a configmap, rather than being managed through AAD
  2. Current implementation requires updating the SAS Notebook Exceptions in 2 places: (1) non-employee-exceptions-config.jsonnet and (2) deny-external-users constraint.yaml. This can be fixed in a future refactor to reuse the same configmap, but due to time constraints I did not have time to figure out how to do this.
  3. An admin needs to be involved if a user is removed from the abovementioned configmap + gatekeeper constraint as an inconsistent state can be reached by adding an exception case, then removing that exception case while the user is still in the namespace. For example, if [email protected] is added as an exception, then a notebook is created, and then [email protected] is removed, [email protected] is still able to access the notebook that was created when he had access. Therefore, an admin would also need to remove him from the namespace and/or delete the SAS notebook to bring the system back into a consistent state.

Remaining TODO Items

  • @Souheil-Yazji , @bryanpaget , and @Collinbrown95 to independently do end-to-end testing on dev cluster once refactors from code review have been implemented, then report back and consolidate findings.

  • Deploy change into aaw-prod once sufficient testing has been done and the feature is properly documented.

Required Changes

Bug Fix PRs

Deployment PRs

Quality Assurance TODO Items

  • @Souheil-Yazji , @bryanpaget , and @Collinbrown95 do a static code review to go over how all of the logic works and so that multiple people have a common understanding of how each component works. Done on September 14 2022

Refactors after code review

Implementation Details

This implementation proposes two kinds of feature capabilities: (1) Pod/Notebook features and (2) Namespace features.

For example, the SAS notebook feature is a Pod/notebook feature because it requires deploying a specific kind of notebook into a namespace, whereas cloud-main-connectivity is a namespace feature because routing pod traffic through the egress gateway is determined by rules that are applied at the level of the namespace.

The semantics for labels behind each feature work as follows.

Pod/Notebook Feature

The profile state controller applies a label of the form has-X-feature if any pod/notebook in the namespace has that feature (e.g. a pod with the sas image). Additionally, the profile state controller applies a label of the form exists-non-X-user if any subject in any rolebinding in the namespace is not an employee or not in the list of exceptions for that capability. The profiles state controller will apply the following labels to the profile and namespace:

  • has-X-feature: true if any pod in the namespace has that feature
  • exists-non-X-user: true if any subject in any role binding is not an employee or is not in the exception list for that capability.

If a non-employee without an exception is added to a rolebinding in a namespace where the label has-X-feature: true is present, a gatekeeper policy blocks this request.

If a pod with X-feature is added to a namespace where the label exists-non-X-user is present, a gatekeeper policy blocks this request.

Namespace Feature

A namespace feature only requires the logic surrounding the exists-non-X-user label described above. In the case of cloud-main connectivity, the cloud-main and network controllers in aaw-kubeflow-profiles-controller should automatically reconcile the network policies / virtual services involved if a namespace does not have the cloud main connectivity capability. No gatekeeper policy should be required.

Test Cases

Unit Tests

We should unit test the following cases for profile-state-controller - profiles-state-controller is already set up to handle unit testing, so we just need to add the following test cases (then there is no uncertainty as to whether the RBAC logic for labels is applied correctly).

  1. If rolebinding contains only statcan domains --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: false and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: false
  2. If rolebinding contains statcan domains AND an external user with only the SAS exception --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: false and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: true
  3. If rolebinding contains statcan domains AND an external user with only the cloud main exception --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: true and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: false
  4. If rolebinding contains statcan domains AND an external user with only the cloud main exception AND a user with only the SAS exception --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: true and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: true
  5. If rolebinding contains external users who all have both the cloud main exception and the SAS exception --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: false and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: false
  6. If rolebinding contains only external users who have no exceptions --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: true and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: true
  7. If a rolebinding contains StatCan domains AND a single external user with no exceptions --> profiles-state-controller should produce the labels state.aaw.statcan.gc.ca/exists-non-sas-notebook-user: true and state.aaw.statcan.gc.ca/exists-non-cloud-main-user: true

End-to-end tests

Verify that all of the original logic for profile-state controller works. I.e.

  • if namespace has non-exception external employee --> deny create sas notebook
  • if sas pod exists already --> deny add non-exception external employee
  • if only statcan employees / valid exception cases --> allow create sas notebook
  • if sas notebook exists and add valid exception case / statcan employee to namespace --> allow add valid employee / exception case

Also need to test that the interaction with cloud main connectivity works correctly. I.e.

  • if non-exception user is added to the namespace, requests to e.g. gitlab.k8s should not get routed through egress gateway (i.e. will be blocked at AAW hub firewall level)
  • if namespace contains only users with cloud-main connectivity capability, then requests to e.g. gitlab.k8s should get routed through egress gateway (i.e. not blocked).
  • if a namespace has a user without the cloud main capability, then that user is removed, the remaining users should be unblocked from sending cloud main requests.
@Collinbrown95
Copy link
Contributor Author

Collinbrown95 commented Sep 14, 2022

Collin End to End Testing

Testing Steps

  1. Start with aaw-fc namespace (no contributors)
  2. add [email protected], [email protected], [email protected], [email protected] (the 4 initial exception cases) to aaw-fc namespace
  3. verify that I can still create a SAS notebook
  4. add another external user without the SAS exception to aaw-fc namespace ([email protected])
  5. verify that SAS button is disabled
  6. remove [email protected]
  7. create SAS notebook in aaw-fc namespace
  8. verify that we can't add [email protected] once the SAS notebook exists in the namespace
  9. Add [email protected] to gatekeeper constraint https://github.com/StatCan/aaw-gatekeeper-constraints/blob/aaw-dev-cc-00/deny-external-users/constraint.yaml and sync with argocd
  10. Add [email protected] to non-employee-exceptions configmap https://github.com/StatCan/aaw-kubeflow-profiles/commit/b39c1a0ba1d413968004aee1b34eedfc4d2301a5.
  11. Sync configmap with ArgoCD and restart the profile-state-controller deployment.
  12. Verify that we can now add [email protected] to the namespace that currently has the sas notebook.
  13. Delete [email protected] from configmap and sync argocd and restart profile-state-controller deployment https://github.com/StatCan/aaw-kubeflow-profiles/commit/a6dc45775d8ea3a7c616d7f287dca3b555ff17c6
  14. Delete [email protected] from gatekeeper constraint and sync with argocd https://github.com/StatCan/aaw-gatekeeper-constraints/commit/bf089dfa143a6589991e59dccfb05f3c23c5cd84
  15. IMPORTANT At this point, the namespace is in an invalid state; the admin who removed the user must also remove [email protected] from the namespace or delete the SAS notebook. For this test, I delete [email protected] from the namespace.
  16. verify that we cannot add back [email protected] while the SAS Notebook is present
  17. delete the SAS notebook
  18. verify that we can once again add back [email protected]
  19. remove [email protected] and create a SAS notebook again
  20. verify that we can add @statcan.gc.ca domain and @cloud.statcan.ca domain emails to the namespace while a SAS pod is present.

Deny SAS pod created through kubectl

If a user circumvents the UI and tries to e.g. create a pod resource by running kubectl apply on a manifest, the request to create the pod should be blocked by gatekeeper.

A non-exception user was added to the namespace aaw-fc, so we expect that SAS pods should not be allowed to be created in this namespace.

Test case

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-create-sas-pod
  namespace: aaw-fc
  labels:
    app: sas
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sas
  template:
    metadata:
      labels:
        app: sas
    spec:
      containers:
        - name: sas
          image: k8scc01covidacr.azurecr.io/sas:latest
  • The Deployment and Replicaset will be created, but when the Replicaset tries to create the Pod containing the SAS image, gatekeeper should deny this request and the pod spec should be blocked by validating admission control and never get posted to etcd.

  • After running kubectl apply on the deployment spec above, the replicaset failed to create the pod with the following error:

Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age               From                   Message
  ----     ------        ----              ----                   -------
  Warning  FailedCreate  3s (x2 over 16s)  replicaset-controller  Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [denyemployeeonlyfeatures] Pod has state.aaw.statcan.gc.ca/exists-non-sas-notebook-user=true and container uses a SAS image k8scc01covidacr.azurecr.io/sas:latest

Try and add rolebinding manually with kubectl

If a user circumvents the web UI and tries to create an invalid rolebinding manually with kubectl apply, this should be blocked by the appropriate gatekeeper policy.

Test Case

The namespace aaw-fc has employees and non-employe SAS exceptions, and a SAS notebook currently scheduled.

I try to add the following rolebinding manually:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: invalid-rb
  namespace: aaw-fc
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kubeflow-edit
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: [email protected]

The following error is returned:

Error from server ([denyexternalusers] Profile aaw-fc has state.aaw.statcan.gc.ca/has-sas-notebook-feature=true): error when creating "test-rb.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [denyexternalusers] Profile aaw-fc has state.aaw.statcan.gc.ca/has-sas-notebook-feature=true

@Collinbrown95
Copy link
Contributor Author

Collinbrown95 commented Sep 18, 2022

Deployment to Production Environment

Post Deployment Items

  • Revert argocd watching my fork of gatekeeper-policies repo
  • Revert argocd watching the aaw-dev-cc-00 branch of gatekeeper constraints
  • Document future refactors on this feature
  • Update documentation for this feature in aaw-developer-docs

@YannCoderre
Copy link
Contributor

@Collinbrown95 , I hope as is well, Are we still going to be hitting the deadline for the end of day?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants