-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller should work in namespaced or role-only mode #1295
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Thank you for pointing this @philwinder. I believe it was designed by that because Katib is installed as part of Kubeflow. In Kubeflow you should submit Experiments only in your user profile namespace. Also see this: #162. On the other hand, Katib can be also stand-alone deployment where not giving cluster-role access is possible. Regarding to the webhook, how we should handle multi-user usage with more than one controller in namespace-role access? |
Hi @andreyvelich. Just stumbled across my own issue again. :-) Yes, you're right, and that is a problem with Kubeflow too. It expects a single kubeflow instance and the ability to create/control namespaces. For many of my clients that is not acceptable. Namespaces are locked down. But yes, a simple solution would be to provide the ability to create a webhook with the namespace name in the name of the webhook, or the ability to customize the name of the webhook. That way, katib can install multiple webhooks without clashing/overwriting. Another alternative is to add an option to disable the mutating webhook entirely! After a lot of hacking around, I've found that if you disable some of the cluster scoped roles, it accidentally disables the webhook controller. So it doesn't even install mutating webhook. Depending on what you disable, parts of katib work. Other's don't. Any thoughts? Phil |
Just a quick update on this to help future me/people. In the end I have written a script/pod that proactively scrapes for the hardcoded mutating and validating configs, deletes them, then creates new ones with non-clashing equivalents. |
@philwinder If we disable mutating webhook, metrics collector doesn't work. Do you want to attend one of the upcoming AutoML meetings to discuss questions of using Katib without cluster-role access ? |
I have a similar, but slightly different ask -- I want Katib to have true multi-user support. For me, that means that I can have different configurations for Katib in each namespace (I'm ok with having a centralized controller though). This would allow me to customize, for example, the Katib configmap in each namespace to use a different service account for each namespace. This would also entail running experiments in separate namespaces (which is useful for having resource quotas and proper separation of concerns). I was told to post here following this discussion on slack: https://kubeflow.slack.com/archives/C018PMV53NW/p1604351060029800?thread_ts=1604341870.028800&cid=C018PMV53NW |
/priority p2 |
Hi @andreyvelich ,
Yes, I understand. I was proposing that we stop hardcoding the name of the webooks. In my case, I am deleting the created webhooks and recreating them with team-friendly names. That works.
Yes, that would work too. Although that would stop the UI-generated job from working, since that expects the webhook to mutate the request. But yes, adding a flag to disable them, then adding a flag to the UI to "add metrics to this job..." would work. I prefer this because then katib is not having to make global changes.
I'm happy to attend if you want me to, but I have no more questions, thank you. Dear @nlarusstone, Thanks all, |
@philwinder I agree that Kubeflow isn't intended to run multiple instances on the same cluster, but there is definitely an intent to allow |
@philwinder What do you mean by: "stop the UI-generated job from working"?
If our community wants to see role-level access in Katib, we can make it for the standalone installation. |
Hi @andreyvelich,
Not quite, the webhook is created by the controller itself, in the go code. It's not created by the user during the install of the manifests, for instance. If it was created during installation via a manifest, then we could modify the name of the webhooks. As it stands, we can't alter any of the behaviour of the webhook creation. Thanks! |
Yes, that is correct. Probably we should add the manifests for the webhooks, like mentioned here: kubeflow/manifests#1379. The question is, how we make sure that user's webhooks have the unique names. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it. |
/kind feature
In multi-tenant clusters, I don't want users to have cluster-level access to objects. The crude approach is to replace all ClusterRoles (and bindings) with Roles. This works fine for the 0.9.0 UI, because it falls back to requesting experiments/etc. from the current namespace.
However, many parts of the controller codebase assume that there is only a single instance of the controller installed and also need cluster-scope, because that was how it was designed. For example, Katib installs a MutatingWebhookConfiguration that points to the namespace where Katib is installed. But the name of the MWC is hardcoded, so a subsequent install of Katib overwrites the MWC.
I'm sure there's more too. I suspect this requires fundamental changes in the design/operation of the controller.
This represents a security risk for my clients, because all users can query the controller API and observe the status/results/etc. of the jobs from other namespaces, even ones they don't have access to.
For future people that find this issue, if this isn't implemented, then the only workaround is to have a single centralised controller, and many UI's in each namespace. And accept the security risk.
The text was updated successfully, but these errors were encountered: