Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): isolate artifacts per namespace/profile/user using only one bucket #7725

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

juliusvonkohout
Copy link
Member

@juliusvonkohout juliusvonkohout commented May 13, 2022

Fixes #4649

Due to newly discovered functionality in argo we will not use multiple buckets but just multiple folders and the following iam policy. Furthermore e.g. AWS has a limit of 100 buckets so we should definitely stick with 1 bucket. This is a follow up of #7406

By the way I am open to change the folder structure etc. as you prefer.

One bucket multiple folder namespace isolation

  1. change the workflow-controller configmap by adding workflow.namespace to keyFormat: "private-artifacts/{{workflow.namespace}}/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"
    We do not need the per namespace artifactrepositories configmaps anymore, but users can still use them to change the default artifact storage defined above.

  2. We still need to create a minio user in sync.py and and save the individual credentials to the per namespace secret. But we now need only one policy although this policy must still be attached to every user. So just some small changes in sync.py. We do not need to create additional buckets anymore

"Resource": [
    "arn:aws:s3:::%s/artifacts/*"  % shared_bucket_name, # old shared artifacts for backwards compatibility
    "arn:aws:s3:::%s/private-artifacts/${aws:username}/*"  % shared_bucket_name, # private artifacts
    "arn:aws:s3:::%s/private/${aws:username}/*"  % shared_bucket_name, # private storage
    "arn:aws:s3:::%s/shared/*"  % shared_bucket_name # shared storage for collaboration
]

We exclude "arn:aws:s3:::%s/pipelines/*" % shared_bucket_name, # shared pipelines because the the apiserver, not the user should manage pipeline definitions on minio. Hopefully @zijianjoy will remove them entirely from MinIO soon.
So changing the environment variable

pipelinePath = "MINIO_PIPELINE_PATH"
and splitting the storage into shared and private pipelines becomes unnecessary.

  1. We can revert to the upstream images for apiserver, persistenceagent etc. Only ml-pipeline-ui changes from WIP namespace isolation for artifacts and pipeline definitions #7406 must be kept for managing namespaced pipeline definitions in the UI, but this can be done with another PR. The pipeline-ui from the PR 7406 does not need to be changed since it uses the apiserver properly according to gabors changes in frontend/src/pages/PipelineList.tsx . The proposal from arrikto or feat(frontend, sdk): towards namespaced pipelines. Part of #4197 #7447 could also be used.

  2. i removed the DEPRECATED (https://www.kubeflow.org/docs/components/pipelines/sdk/python-based-visualizations/) visualizationserver (ml-pipeline-visualizationserver). Please use the supported stuff and "viewers" https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/

  3. I added a network policy that complements https://github.com/kubeflow/manifests/tree/master/contrib/networkpolicies

  4. I added a fix for [bug] pod is forbidden: failed quota: kf-resource-quota: must specify cpu,memory #7699

  5. I added the pod default such that on jupyterlab creation one can enable easy authentication to KFP according to https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode

  6. I added a kubernetes NetworkAttachmentDefinition for istio-cni. This will be necessary to run Kubeflow completly rootless (as i am already doing with istio-cni and without NET_ADMIN and NET_RAW capabilities) on Openshift https://istio.io/latest/docs/setup/platform-setup/openshift/#additional-requirements-for-the-application-namespace . Vanilla Kubernetes might not need this, but it does no harm.

  7. There is also a security issue in the UI server because i can read other peoples artifacts just by removing =namespace=xxx from the end of the artifact URL. Also there an attacker has to guess the right file names.

  8. The lifecycle policy that i have added works wonderfully with the minio storage backend and proper cache settings (feat(frontend): caching may now be disabled when starting pipeline runs #8177) to delete old cache artifacts after 30 days.

Checklist:

@google-oss-prow google-oss-prow bot added size/L and removed size/XS labels May 13, 2022
@juliusvonkohout juliusvonkohout changed the title WIP Seperate artifacts per namespace/profile/user using only one bucket Seperate artifacts per namespace/profile/user using only one bucket May 14, 2022
@juliusvonkohout
Copy link
Member Author

@chensun @zijianjoy @jacobmalmberg @ca-scribner i finally have something that is worth merging for namespace isolation. It is also quite easy to test on your own instances, because i only changed 4 configuration files.

@juliusvonkohout
Copy link
Member Author

/assign @zijianjoy

@juliusvonkohout juliusvonkohout changed the title Seperate artifacts per namespace/profile/user using only one bucket feat(backend): seperate artifacts per namespace/profile/user using only one bucket May 14, 2022
@juliusvonkohout
Copy link
Member Author

@chensun can you provide some feedback? I will happily implement any changes if needed.

@juliusvonkohout
Copy link
Member Author

/retest

@juliusvonkohout
Copy link
Member Author

We also have to think about switching the prefixes to /artifacts, /namespace/artifacts and /shared only to make it easier for users and graphical s3 browsers, but that would mean that we have to prevent usernames that collide with other top level folders e.g. "pipelines".

@lehrig
Copy link

lehrig commented Aug 25, 2023

@zijianjoy, @chensun this is an important feature especially in enterprise contexts. Can we push this a bit?
@juliusvonkohout: anything left from your side on this issue?

@juliusvonkohout
Copy link
Member Author

@lehrig feel free to join our security wg meetings and push this in the KFP meetings as well. We have to discuss a lot.

@github-actions github-actions bot added the Stale label May 2, 2024
@juliusvonkohout
Copy link
Member Author

@lehrig i am thinking of providing this PR as overlay in the manifests repository.

@AndersBennedsgaard
Copy link

Very nice changes @juliusvonkohout . Here's a few of my personal thoughts on the proposed changes:

The PR includes many valuable changes; however, it introduces too many changes in a single PR. Due to the breaking changes and some alterations that require further discussion, several small fixes have remained unresolved for over two years, even though they are not blocked for any specific reason.
Suggestions:

  1. Segregate Changes:

    • Consider splitting the PR into multiple, smaller PRs for more efficient review and integration. Some of the changes that could be introduced in separate PRs (and perhaps should be discussed in a new issue first) include:
      • Removal of the visualization server.
      • Management of PodDefaults in the profile controller.
      • Removal of the =namespace=xxx from the artifact URL.
      • Creation of NetworkPolicies
  2. Dependency Management:

    • There seems to be no need for installing the boto3 and kubernetes packages. The minio Python package can handle PUTing the LFC configuration, which is the responsibility of the boto3 package with your changes.
    • Resources managed by the Metacontroller for the parent (namespace) are POSTed to the /sync URL, so looking up existing Secrets on the cluster using the kubernetes package is unnecessary. This would also eliminate the need for additional Kubernetes RBAC.
  3. Server Implementation:

    • We might want to avoid using the Python http.server package as it is not recommended for production due to its limited security features. We have for example successfully implemented the Pipelines profile controller using FastAPI.
  4. Controller Type:

    • We might want to consider converting the profile controller from a CompositeController to a DecoratorController. According to the Metacontroller documentation, the Metacontroller expects to have full control of the parent resource. The Pipelines profile controller does not conform to this expectation since the Kubeflow profile controller does this. Instead, converting the controller into a DecoratorController, which has a slightly different API, would be more appropriate.

I might be able to contribute the profile controller we have created based on this PR if you are interested. Please let me know if you prefer it as a PR to https://github.com/juliusvonkohout/pipelines. I will need to discuss this with my organization first, though.

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jun 10, 2024

@AndersBennedsgaard are you in #kubeflow-platform on the new slack? We are considering apache ozone as well as minio replacement. I think

  • Removal of the visualization server

as done in this PR here is the first step. Please create a separate PR for that and link to this PR here and the corresponding issue #4649 and tag @rimolive as well.

@juliusvonkohout
Copy link
Member Author

@AndersBennedsgaard i just added you to https://github.com/juliusvonkohout/pipelines as collaborator.

@AndersBennedsgaard
Copy link

Yes, I am part of the new Slack channels. We will most likely not adopt to using Apache Ozone in our project since we already use MinIO in other projects, so I will probably not be able to contribute with much in that regard. But I might take a look at some of the other stuff if I get approval on my end.

@AndersBennedsgaard
Copy link

@juliusvonkohout I've created #10905 to discuss which components can be removed, and how. I would like to get some feedback on this before I create a PR for it.
I've also created #10904 and #10903 to discuss some of my other points from above

@juliusvonkohout
Copy link
Member Author

/reopen
/lifecycle frozen
for the time being

@google-oss-prow google-oss-prow bot reopened this Sep 4, 2024
Copy link

@juliusvonkohout: Reopened this PR.

In response to this:

/reopen
/lifecycle frozen
for the time being

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

@juliusvonkohout: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/reopen
/lifecycle frozen
for the time being

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from zijianjoy. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout
Copy link
Member Author

/ok-to-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Multi User] Support separate artifact repository for each namespace
7 participants