Controller should work in namespaced or role-only mode #1295

philwinder · 2020-08-06T10:03:05Z

/kind feature

In multi-tenant clusters, I don't want users to have cluster-level access to objects. The crude approach is to replace all ClusterRoles (and bindings) with Roles. This works fine for the 0.9.0 UI, because it falls back to requesting experiments/etc. from the current namespace.

However, many parts of the controller codebase assume that there is only a single instance of the controller installed and also need cluster-scope, because that was how it was designed. For example, Katib installs a MutatingWebhookConfiguration that points to the namespace where Katib is installed. But the name of the MWC is hardcoded, so a subsequent install of Katib overwrites the MWC.

I'm sure there's more too. I suspect this requires fundamental changes in the design/operation of the controller.

This represents a security risk for my clients, because all users can query the controller API and observe the status/results/etc. of the jobs from other namespaces, even ones they don't have access to.

For future people that find this issue, if this isn't implemented, then the only workaround is to have a single centralised controller, and many UI's in each namespace. And accept the security risk.

issue-label-bot · 2020-08-06T10:03:13Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/front-end	0.59
area/katib	0.80

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

andreyvelich · 2020-09-10T21:38:58Z

Thank you for pointing this @philwinder.

I believe it was designed by that because Katib is installed as part of Kubeflow. In Kubeflow you should submit Experiments only in your user profile namespace. Also see this: #162.

On the other hand, Katib can be also stand-alone deployment where not giving cluster-role access is possible.
That requires some component changes.

Regarding to the webhook, how we should handle multi-user usage with more than one controller in namespace-role access?

/cc @gaocegege @johnugeorge

philwinder · 2020-10-21T11:44:29Z

Hi @andreyvelich. Just stumbled across my own issue again. :-)

Yes, you're right, and that is a problem with Kubeflow too. It expects a single kubeflow instance and the ability to create/control namespaces. For many of my clients that is not acceptable. Namespaces are locked down.

But yes, a simple solution would be to provide the ability to create a webhook with the namespace name in the name of the webhook, or the ability to customize the name of the webhook. That way, katib can install multiple webhooks without clashing/overwriting.

Another alternative is to add an option to disable the mutating webhook entirely! After a lot of hacking around, I've found that if you disable some of the cluster scoped roles, it accidentally disables the webhook controller. So it doesn't even install mutating webhook. Depending on what you disable, parts of katib work. Other's don't.

Any thoughts?

Phil

philwinder · 2020-10-22T10:41:39Z

Just a quick update on this to help future me/people. In the end I have written a script/pod that proactively scrapes for the hardcoded mutating and validating configs, deletes them, then creates new ones with non-clashing equivalents.

andreyvelich · 2020-10-22T12:54:00Z

@philwinder If we disable mutating webhook, metrics collector doesn't work.
We can add webhook YAMLs to the manifests once user deploys Katib.
In that case, controller doesn't need to deploy webhooks.

Do you want to attend one of the upcoming AutoML meetings to discuss questions of using Katib without cluster-role access ?

/cc @gaocegege @johnugeorge

nlarusstone · 2020-11-02T21:35:19Z

I have a similar, but slightly different ask -- I want Katib to have true multi-user support. For me, that means that I can have different configurations for Katib in each namespace (I'm ok with having a centralized controller though).

This would allow me to customize, for example, the Katib configmap in each namespace to use a different service account for each namespace. This would also entail running experiments in separate namespaces (which is useful for having resource quotas and proper separation of concerns).

I was told to post here following this discussion on slack: https://kubeflow.slack.com/archives/C018PMV53NW/p1604351060029800?thread_ts=1604341870.028800&cid=C018PMV53NW

andreyvelich · 2020-11-02T21:40:33Z

/priority p2

philwinder · 2020-11-03T10:32:36Z

Hi @andreyvelich ,
Thanks for your comments.

If we disable mutating webhook, metrics collector doesn't work.

Yes, I understand. I was proposing that we stop hardcoding the name of the webooks. In my case, I am deleting the created webhooks and recreating them with team-friendly names. That works.

We can add webhook YAMLs to the manifests once user deploys Katib. In that case, controller doesn't need to deploy webhooks.

Yes, that would work too. Although that would stop the UI-generated job from working, since that expects the webhook to mutate the request. But yes, adding a flag to disable them, then adding a flag to the UI to "add metrics to this job..." would work. I prefer this because then katib is not having to make global changes.

attend one of the upcoming AutoML meetings

I'm happy to attend if you want me to, but I have no more questions, thank you.

Dear @nlarusstone,
Yes, I agree. There's a bigger story here. For example, removing the mandated need for cluster-scope access for the controller. That and the UI shouldn't NEED cluster-scope access to do it's job. But this is a wider problem with Kubeflow. Nearly all KF components have this problem. Kubeflow was never intended to run multiple instances on the same cluster. A major oversight IMO.

Thanks all,
Phil

nlarusstone · 2020-11-03T14:00:20Z

@philwinder I agree that Kubeflow isn't intended to run multiple instances on the same cluster, but there is definitely an intent to allow multi-user isolation (see docs here: https://www.kubeflow.org/docs/components/multi-tenancy/). While Katib supports that on the UI, I would like to see more in depth support of it on the cluster level by allowing us to schedule resources in separate namespaces. I'm fine with having a single Kubeflow instance that coordinates the running of resources in separate namespaces (and therefore wouldn't need cluster-scoped access).

andreyvelich · 2020-11-03T14:08:29Z

Yes, that would work too. Although that would stop the UI-generated job from working, since that expects the webhook to mutate the request. But yes, adding a flag to disable them, then adding a flag to the UI to "add metrics to this job..." would work. I prefer this because then katib is not having to make global changes.

@philwinder What do you mean by: "stop the UI-generated job from working"?
I believe, we create webhook when Katib controller is created.

Yes, I agree. There's a bigger story here. For example, removing the mandated need for cluster-scope access for the controller. That and the UI shouldn't NEED cluster-scope access to do it's job. But this is a wider problem with Kubeflow. Nearly all KF components have this problem. Kubeflow was never intended to run multiple instances on the same cluster. A major oversight IMO.

If our community wants to see role-level access in Katib, we can make it for the standalone installation.
When you use Katib outside of Kubeflow.

philwinder · 2020-11-17T14:47:46Z

Hi @andreyvelich,
Sorry, I mean that when you specify an experiment, it has to be mutated in order to work. I.e. you need to inject the metrics collector. If it doesn't go through that mutation then you won't see results.

we create webhook when Katib controller is created

Not quite, the webhook is created by the controller itself, in the go code. It's not created by the user during the install of the manifests, for instance. If it was created during installation via a manifest, then we could modify the name of the webhooks. As it stands, we can't alter any of the behaviour of the webhook creation.

Thanks!

andreyvelich · 2020-11-17T15:10:07Z

Not quite, the webhook is created by the controller itself, in the go code. It's not created by the user during the install of the manifests, for instance. If it was created during installation via a manifest, then we could modify the name of the webhooks. As it stands, we can't alter any of the behaviour of the webhook creation.

Yes, that is correct. Probably we should add the manifests for the webhooks, like mentioned here: kubeflow/manifests#1379.

The question is, how we make sure that user's webhooks have the unique names.

stale · 2021-06-11T01:59:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2021-07-08T08:36:27Z

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

k8s-ci-robot added the kind/feature label Aug 6, 2020

issue-label-bot bot added area/front-end area/katib labels Aug 6, 2020

andreyvelich removed the area/front-end label Nov 2, 2020

k8s-ci-robot added the priority/p2 label Nov 2, 2020

andreyvelich mentioned this issue Nov 13, 2020

Does Katib need Cluster level scope? #162

Closed

andreyvelich mentioned this issue Nov 25, 2020

Namespaced scope to monitor experiments. #693

Closed

stale bot added the lifecycle/stale label Jun 11, 2021

stale bot closed this as completed Jul 8, 2021

andreyvelich mentioned this issue Nov 9, 2021

Narrow and explicit RBAC #1639

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller should work in namespaced or role-only mode #1295

Controller should work in namespaced or role-only mode #1295

philwinder commented Aug 6, 2020

issue-label-bot bot commented Aug 6, 2020

andreyvelich commented Sep 10, 2020

philwinder commented Oct 21, 2020

philwinder commented Oct 22, 2020

andreyvelich commented Oct 22, 2020

nlarusstone commented Nov 2, 2020

andreyvelich commented Nov 2, 2020

philwinder commented Nov 3, 2020 •

edited

Loading

nlarusstone commented Nov 3, 2020

andreyvelich commented Nov 3, 2020 •

edited

Loading

philwinder commented Nov 17, 2020

andreyvelich commented Nov 17, 2020

stale bot commented Jun 11, 2021

stale bot commented Jul 8, 2021

Controller should work in namespaced or role-only mode #1295

Controller should work in namespaced or role-only mode #1295

Comments

philwinder commented Aug 6, 2020

issue-label-bot bot commented Aug 6, 2020

andreyvelich commented Sep 10, 2020

philwinder commented Oct 21, 2020

philwinder commented Oct 22, 2020

andreyvelich commented Oct 22, 2020

nlarusstone commented Nov 2, 2020

andreyvelich commented Nov 2, 2020

philwinder commented Nov 3, 2020 • edited Loading

nlarusstone commented Nov 3, 2020

andreyvelich commented Nov 3, 2020 • edited Loading

philwinder commented Nov 17, 2020

andreyvelich commented Nov 17, 2020

stale bot commented Jun 11, 2021

stale bot commented Jul 8, 2021

philwinder commented Nov 3, 2020 •

edited

Loading

andreyvelich commented Nov 3, 2020 •

edited

Loading