Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi User] Support separate artifact repository for each namespace #4649

Open
Bobgy opened this issue Oct 21, 2020 · 71 comments · May be fixed by #7725
Open

[Multi User] Support separate artifact repository for each namespace #4649

Bobgy opened this issue Oct 21, 2020 · 71 comments · May be fixed by #7725

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Oct 21, 2020

Part of #1223

Support configuring separate artifact repository for each namespace, so that artifact repository can have namespace separation too.

This is currently not supported in KFP multi user support as explained in #1223 (comment).

Use your thumbs up to show how important this issue is for you.

We'll most likely need upstream enhancement from argo in argoproj/argo-workflows#3307.

@Bobgy Bobgy added cuj/multi-user upstream_issue help wanted The community is welcome to contribute. labels Oct 21, 2020
@Ark-kun
Copy link
Contributor

Ark-kun commented Oct 22, 2020

Last year I've added the ArtifactRepositoryRef feature to Argo to support this goal.
The idea is to make it possible to specify the artifact repository information by referencing a ConfigMap.
See argoproj/argo-workflows#1350 and argoproj/argo-workflows#1821
With both features in place it might even be unnecessary to make any SDK-side changes.
Each namespace can have a different artifact repository ConfigMap with the same name. Then Argo will choose one based on the namespace.
The backend can add the artifactRepositoryRef field when submitting the run.

I'm not sure argoproj/argo-workflows#3307 is relevant since it's about explicit input artifacts, a feature not used by KFP.

@Bobgy
Copy link
Contributor Author

Bobgy commented Nov 12, 2020

If anyone would like to help this issue, confirming artifactRepositoryRef usage and whether it supports all the features we want can be a step forward

@solarist
Copy link

In case of S3, would it make sense to perform the auth for minio using dex? Because otherwise someone has to generate access/secret keys for each namespace -> https://github.com/minio/minio/blob/master/docs/sts/dex.md

@marcjimz
Copy link

marcjimz commented Dec 15, 2020

@Bobgy we can pick this up as this will be a requirement for us to go multi-tenant. I think this makes sense as the administrator of the platform would be provisioning the namespaces and with that can provision the scoped configMap.

I suspect we want to edit the workflow spec more upstream to here:

result, err := p.clientSet.ArgoprojV1alpha1().Workflows(namespace).Create(workflow.Get())

Any suggestions on the best place to support the change to the workflow spec?

edit: Suppose writing methods to add this new object to the workflow spec could be added here:

@Bobgy
Copy link
Contributor Author

Bobgy commented Dec 16, 2020

Hi @marcjimz, thank you for offering help.

You can look at

func (r *ResourceManager) CreateRun(apiRun *api.Run) (*model.RunDetail, error) {

and
func (r *ResourceManager) CreateJob(apiJob *api.Job) (*model.Job, error) {

The two methods are responsible for preparing argo workflow yaml before submitting to K8s api, we might be able to inject the artifact repository ref there.

Read documentation for argo in https://argoproj.github.io/argo/configure-artifact-repository/,

@arllanos
Copy link

Hi @Bobgy,
I have been testing the artifactRepositoryRef feature and also tested locally initial pipeline code change as to inject the artifactRepositoryRef as suggested above.

Before creating a PR I would like to bring up the following points to get your feedback:

  1. This feature seems to be working since version 2.9. So, we need to upgrade argo from 2.3.0 to 2.9.3 or newer. We have been discussing with @marcjimz to use 2.10.0. Do you have any comments or issues regarding 2.10.0?
  2. Regarding the configMap mentioned by @Ark-kun, based on the documentation if we name the config map artifact-repositories it will be the default config map for artifact repositories in the namespace.
apiVersion: v1
kind: ConfigMap
metadata:
    # if you want to use this config map by default - name it "artifact-repositories" 
  name: artifact-repositories
  annotations:
    # if you want to use a specific key, put that's key into this annotation 
    workflows.argoproj.io/default-artifact-repository: mlpipeline-repository
data:
  mlpipeline-repository: |
    s3:
      bucket: my-bucket
      keyPrefix: artifacts
      endpoint: minio-service.kubeflow:9000
      insecure: true
      accessKeySecret:
        name: mlpipeline-minio-artifact
        key: accesskey
      secretKeySecret:
        name: mlpipeline-minio-artifact
        key: secretkey

We can either rely on the default config map name (artifact-repositories) or we can have separate config map like mlpipeline-minio-artifact. I would be using the default artifact-repositories config map but you may have reasons to use a separate config map instead. What is you feedback here?

@Bobgy
Copy link
Contributor Author

Bobgy commented Jan 20, 2021

Awesome progress,

  1. We can go to argo 2.12 directly if we need to upgrade anyway. Argo maintainer said it will be LTS.
  2. Whatever the simpler is better, if there is a config map picked by default, that sounds the best. More options can be added as follow ups.

@arllanos
Copy link

arllanos commented Jan 22, 2021

@Bobgy, feel free to take a look into #5017.
Note this new functionality requires the following:

  • new bucket for each profile
  • artifact-repositories configmap for each profile.

We could have them created as part of profile creation and tackle this in a separate PR.

@amitripshtos
Copy link

I'm really glad that this feature is pushed! I wonder if the standalone version will be able to use this feature. Any thoughts about that?

@arllanos
Copy link

arllanos commented Jan 27, 2021

I'm really glad that this feature is pushed! I wonder if the standalone version will be able to use this feature. Any thoughts about that?

Good point @amitripshtos regarding standalone version. I'll to test this particular scenario. Depending on the result, extra changes might be needed.

@arllanos
Copy link

arllanos commented Jan 27, 2021

@Bobgy / @Ark-kun,
as mentioned above, when submitting workflows with the ArtifactRepositoryRef field, two things have to exist for each namespace: (1) the referenced config map artifact-repositories and (2) the bucket that corresponds to the namespace.

  1. To create the config map for each namespace we can take advantage of the pipeline profile controller. The config map can be created as part of the multiuser namespace initialization.
  2. To create the bucket for each namespace, since this is not a k8s object I'm not sure using the controller will be the right approach. Right now I see the default bucket is created in the backend apiserver. So we have different alternatives. Do you have any recommendation?

Looking forward to getting your advise about these two points.

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 7, 2021

Content of the config map should be user defined, so it's not suitable for the profile controller.

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 7, 2021

My gut feeling is that, we should make this ref configurable with API request. It's similar to a service account.

Maybe we can define a user configurable namespace default in some way.

Anyway, I'd suggest thinking about the MVP of this feature, so we do not expand responsibility of a single pr too much. There can always be follow ups to add the additional features

@arllanos
Copy link

arllanos commented Apr 2, 2021

@Bobgy, sorry I was disconnected for a while. I would like to resume with issue and to share some points.

The use of ArtifactRepositoryRef is making it possible for runs to put their output artifacts to the bucket configured in the user namespace ConfigMap.

However, for CRUD operations related to pipelines, KFP will not use the ArtifactRepositoryRef feature. All CRUD functionality implemented in the object store does not account for a bucket as parameter. These use the bucketName that was set during minio client initialization.

return storage.NewMinioObjectStore(&storage.MinioClient{Client: minioClient}, bucketName, pipelinePath, disableMultipart)

To fully achieve namespace separation we can adjust the object store to account for a bucketName parameter. This will require to adjust resource manager, persistence agent, etc, so we pass the bucketName as parameter accordingly. Do you think there is another better way?

We can also set by convention that user namespace and bucket name be used interchangeably, that is, if user namespace = xyz; bucketName = xyz.

Regarding your last comment on Feb 7, do you mind to elaborate more on that idea?

My gut feeling is that, we should make this ref configurable with API request. It's similar to a service account.

@amitripshtos
Copy link

amitripshtos commented Apr 3, 2021 via email

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 3, 2021

Agreed, for namespace default artifact repository, we can ask users to config in the argo way.

@maganaluis
Copy link
Contributor

What about public vs namespaced pipelines, we introduced this feature when adding the functionality to the backend, the front-end changes for that are still to be added.

So we would need to take this into account when dealing with a multi-user kubeflow installation. When we retrieve objects we'll have to somehow know if this is a public or namespaced pipeline, so it's not so trivial. In my opinion all artifacts should be stored in their respective namespaced bucket, and the metadata can be used just to determine if they can be accessed or not.

@Bobgy
Copy link
Contributor Author

Bobgy commented Apr 7, 2021

@maganaluis not sure if I fully understood the concern, the current model is that pipeline (templates) may be public or namespaced, but when you run a pipeline, it's always namespaced. Therefore, I don't think there are complexity regarding global/namespaced objects. Did I miss anything?

@juliusvonkohout
Copy link
Member

@Bobgy @jacobmalmberg
it seems that mlpipeline-ui-artifact is a proxy to fetch artifacts from minio with the default environment variable ARTIFACTS_SERVICE_PROXY_ENABLED set for ml-pipeline-ui. mlpipeline-ui-artifact uses some minio configurations that we can override via environment variables as done in #5864. E.g. fetch artifacts from the user namespace minio instance by changing

MINIO_NAMESPACE = 'kubeflow',

This could fix artifact visualization together with #5864.

And regarding the password issue "it does not work even when applying #5864. I think the problem is that in the user namespaces I use different minio credentials / buckets than I do in the kubeflow namespace." That is strange. If mlpipeline-ui-artifact acts as a proxy for fetching artifacts the minio in the kubeflow namespace should be irrelevant. Or is somethin else fetching from the kubeflow namespace minio instance?

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 20, 2022

@jacobmalmberg why are you using minio in gateway mode? It should be a normal minio instance instead of relaying to the kubeflow namespace minio

      - args:
        - gateway
        - s3
        - https://your.minio.here:9000

@juliusvonkohout
Copy link
Member

I have got working visualizations with a per namespace minio instance. I will try changing the passwords soon

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 20, 2022

Also the changed passwords are working and my minio uses istio. @Bobgy @zijianjoy do you appreciate a pull requests that creates local per namespace minios with randomly created passwords? I would alter sync.py to achieve that. So far everything i have tested works, except for run metrics. Maybe a slight alteration is needed such that ml-pipeline fetches the metrics from the correct minio instance. I also need to keep the kubeflow namespace minio instance because there seem to be the pipeline definitions on it.

@jacobmalmberg
Copy link
Contributor

Good work @juliusvonkohout. To enable namespace specific minio instances, apart from #5864, did you only add the MINIO_NAMESPACE env to sync.py or did you change the ARTIFACTS_SERVICE_PROXY_ENABLED env too?

(We use off cluster minio instances and therefore use minio in gateway mode).

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jan 20, 2022

Good work @juliusvonkohout. To enable namespace specific minio instances, apart from #5864, did you only add the MINIO_NAMESPACE env to sync.py or did you change the ARTIFACTS_SERVICE_PROXY_ENABLED env too?

(We use off cluster minio instances and therefore use minio in gateway mode).

I will create a pull request. my only problem is that the mlpipeline-metrics are not working

Everything can be fetched and seen

GET /pipeline/artifacts/get?source=minio&peek=256&bucket=mlpipeline&key=artifacts%2Fsklearn-pipeline-l7pl8%2F2022%2F01%2F20%2Fsklearn-pipeline-l7pl8-935337371%2Fmlpipeline-metrics.tgz
Getting storage artifact at: minio: mlpipeline/artifacts/sklearn-pipeline-l7pl8/2022/01/20/sklearn-pipeline-l7pl8-935337371/mlpipeline-metrics.tgz

Even in the ui
image

but the run output tab where usually the metrics reside is empty. Which is strange since visualizations work. So the metrics are somehow special. Maybe i just have to redeploy everything.

@juliusvonkohout
Copy link
Member

image

image


Getting storage artifact at: minio: mlpipeline/artifacts/basic-pipeline-7q8l5/2022/01/20/basic-pipeline-7q8l5-3930047157/mlpipeline-ui-metadata.tgz
GET /pipeline/artifacts/get?source=minio&peek=256&bucket=mlpipeline&key=artifacts%2Fbasic-pipeline-7q8l5%2F2022%2F01%2F20%2Fbasic-pipeline-7q8l5-3930047157%2Fmlpipeline-metrics.tgz
Getting storage artifact at: minio: mlpipeline/artifacts/basic-pipeline-7q8l5/2022/01/20/basic-pipeline-7q8l5-3930047157/mlpipeline-metrics.tgz
GET /pipeline/artifacts/get?source=minio&peek=256&bucket=mlpipeline&key=artifacts%2Fbasic-pipeline-7q8l5%2F2022%2F01%2F20%2Fbasic-pipeline-7q8l5-3930047157%2Fmlpipeline-ui-metadata.tgz```

@juliusvonkohout
Copy link
Member

@jacobmalmberg I am currently preparing everything for the pull request.
Tomorrow i will create deterministic passwords for the per namespace minios and create the pull request. I propose hash(namespace concatenate kubeflow-pipelines-profile-controller secret). What do you think? I did not yet think much about chosen plain text attacks and other stuff. XOR instead of concatenation might be dangerous, because it would leak at least the first bit with a simple brute force.

@ca-scribner
Copy link
Contributor

My org has some interest in this too and glad to see the progress! Was there a technical reason for having a separate minio instance for each namespace? I worry about scalability - some kubeflow deployments have hundreds of users and that will be a lot of individual minios.

I do like the idea of being able to use minio in gateway mode to proxy to another backend store, that would be really helpful.

@juliusvonkohout
Copy link
Member

My org has some interest in this too and glad to see the progress! Was there a technical reason for having a separate minio instance for each namespace? I worry about scalability - some kubeflow deployments have hundreds of users and that will be a lot of individual minios.

I do like the idea of being able to use minio in gateway mode to proxy to another backend store, that would be really helpful.

What does org mean? Is it an unnecessary abbreviation for organization?

"Was there a technical reason for having a separate minio instance for each namespace?"
Yes, there are multiple reasons.

  1. You would need a sophisticated MinIO user and password management.
  2. A single large MinIO instance is exactly what? Not horizontally scalable. You would need a distributed MinIO instance.
  3. The users are not completely isolated, the attack surface is larger.

"some kubeflow deployments have hundreds of users and that will be a lot of individual minios." What you describe here is exactly horizontal scaling. Even better each MinIO instance is completely isolated from each other and the users can resize their own MinIO PVC if needed. 100 Users and 20 GB per user means 2 Terabyte of storage for 100 users (80 Dollar per Month).
An excerpt from https://cloud.google.com/compute/disks-image-pricing#disk shows how negligible the storage prices are.

"For example, based off US pricing, a 200 GB standard persistent disk volume would cost $8.00 for the whole month. If you only provisioned a 200 GB volume for half a month, it would cost $4.00. Likewise, a 200 GB SSD persistent disk volume would cost $34.00 for the whole month. If you only provisioned a 200 GB volume for half a month, it would cost $17.00.
Provisioned disk space includes all used and unused space. If you provision a 200 GB disk, you are billed for that entire disk space, regardless of how you use it, until you relinquish it."

"I do like the idea of being able to use minio in gateway mode to proxy to another backend store, that would be really helpful." This is a trivial configuration change as done before by @jacobmalmberg. If you have your own S3 storage separate from Kubeflow, feel free to use it. You need to have your own user and password and storage management, which is outside of the Kubeflow scope.

@jacobmalmberg
Copy link
Contributor

jacobmalmberg commented Jan 26, 2022

@jacobmalmberg I am currently preparing everything for the pull request. Tomorrow i will create deterministic passwords for the per namespace minios and create the pull request. I propose hash(namespace concatenate kubeflow-pipelines-profile-controller secret). What do you think? I did not yet think much about chosen plain text attacks and other stuff. XOR instead of concatenation might be dangerous, because it would leak at least the first bit with a simple brute force.

Sounds good, although it doesn't really matter to me. In your PR, have you fixed the metrics output?

My org has some interest in this too and glad to see the progress! Was there a technical reason for having a separate minio instance for each namespace? I worry about scalability - some kubeflow deployments have hundreds of users and that will be a lot of individual minios.

I couldn't get it to work with just one instance. When creating the minio instance, user & password has to be specified IIRC. I also tried using a K8 ExternalName service (https://kubernetes.io/docs/concepts/services-networking/service/#externalname), but couldn't get it to work. Perhaps it doesn't work because of how our Minio backend is implemented off-cluster.

I do like the idea of being able to use minio in gateway mode to proxy to another backend store, that would be really helpful.

This is sort of what I proposed in #6658. Note that #6658 only replaces the minio instance in the Kubeflow namespace and uses the pre-existing mlpipeline-ui-artifact proxies in the user namespaces to connect to it.

@juliusvonkohout
Copy link
Member

#7219 feel free to help debug the metrics display issue.

@ca-scribner
Copy link
Contributor

There are good arguments for either solution, but I think most use cases for larger organizations would benefit more from a single central minio with user buckets than from individual minios in each namespace. Yes that does require making user accounts on the fly, but that seems achievable based on the discussions above. The cost and maintenance benefits of a single minio store outweigh the downsides in my opinion. Especially when considering serious use often wants high availability, backups, and other things that a each user isn't going to manage for themselves. Separate minio instances also makes collaboration more challenging (consider the case where people want to share).

Horizontal scaling can be achieved in a central minio store - minio is designed for scaling. A separate minio store per user is not needed to achieve horizontal scaling.

Re cost and resource waste, storage costs are not the issue. CPU/RAM is the driver. The resource costs of a few dozen or hundred minio instances are not trivial.

I'm not saying there aren't users that would benefit from separate minio instances, but there are downsides too and to me there are more users that benefit from bucket isolation in a single minio instance

@juliusvonkohout
Copy link
Member

@ca-scribner feel free to modify https://github.com/kubeflow/kubeflow/blob/c4ca7a9321eeec6bbb5ff2907a96ed82cfae9681/components/profile-controller/controllers/profile_controller.go#L150 according to https://docs.min.io/docs/minio-multi-user-quickstart-guide.html. To be compatible with any S3 storage you need to use the pure S3 APIs, not MinIO specific ones.
This controller must delete the buckets and users, because pipelines-profile-controller will not get deletion notifications https://github.com/kubeflow/kubeflow/blob/c4ca7a9321eeec6bbb5ff2907a96ed82cfae9681/components/profile-controller/controllers/profile_controller.go#L299

Then you could use pipeline-profile-controller for the creation of users, buckets and policies, since the namespace owner annotation contains the username. But if you have to alter the profile-controller anyway you should just migrate everything into this controller and delete the pipeline-profile-controller entirely. The benefit is getting rid of metacontroller with cluster-admin permissions.

ml-pipeline would still use the root user to get access to everything.

But YOU would have to do it. So far no one volunteered.

@ca-scribner
Copy link
Contributor

That's a very helpful post, thanks!

@jacobmalmberg
Copy link
Contributor

jacobmalmberg commented Jan 27, 2022

@ca-scribner We have sort of the same use case. Although I never tried it, you could probably apply #6658 and then just change the bucket in every namespace with these configmaps:

---
apiVersion: v1
kind: ConfigMap
metadata:
    # if you want to use this config map by default - name it "artifact-repositories"
  name: artifact-repositories
  namespace: {{ $namespace | quote }}
  annotations:
    # v3.0 and after - if you want to use a specific key, put that's key into this annotation
    workflows.argoproj.io/default-artifact-repository: default-v1
data:
  default-v1: |
    archiveLogs: true
    s3:
      endpoint: "minio-service.{{ $namespace }}:9000"
      bucket: {{ $minio_bucket | quote }}
      keyFormat: "artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"
      insecure: true
      accessKeySecret:
        name: mlpipeline-minio-artifact
        key: accesskey
      secretKeySecret:
        name: mlpipeline-minio-artifact
        key: secretkey

---

apiVersion: v1
data:
  defaultPipelineRoot: minio://{{ $minio_bucket }}/artifacts/v2
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: ml-pipeline
    app.kubernetes.io/name: kubeflow-pipelines
    application-crd-id: kubeflow-pipelines
  name: kfp-launcher
  namespace: {{ $namespace | quote }}

It is possible you'll run in to the metrics issue though.

@stale
Copy link

stale bot commented Apr 28, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 28, 2022
@zijianjoy
Copy link
Collaborator

/lifecycle frozen

@thesuperzapper
Copy link
Member

For those watching, deployKF comes pre-configured with namespace isolation for the buckets (both on S3, and the embedded minio).

You can try deployKF here: https://github.com/deployKF/deployKF

You can also read a bit about deployKF in this comment: kubeflow/manifests#2451 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment