Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(frontend&backend): Add UI support for object store customization and prefixes #10787

Merged
merged 8 commits into from
Jun 25, 2024

Conversation

HumairAK
Copy link
Collaborator

@HumairAK HumairAK commented May 3, 2024

Description of your changes:
Frontend Follow up to: #10625
Resolves the object storage portion of: #9689

Changes to backend:

  • For each artifact, add an artifact property in mlmd that keeps track of how to track down where it is located (i.e. which provider, where are the credentials, etc.). This is the same information that is stored as a context property, introduced in Recursion support issue #1065, but because each artifact is not tied to a specific context (import or cached artifacts for example), we propogate these to their artifact properties (when newly created within a context).

  • We assume import artifacts will leverage the environment for it's credentials (e.g. via IRSA), this means both the user executor pod, and kfp ui will need these creds in their env (this is true for any situation where we use fromEnv: true). A future change could also add support for matching import artifacts with prefixes defined in kfp-launcher config.

Additionally, this PR adds the UI support for changes introduced in #10625. Users will now be able to add to their kfp-launcher configmaps the locations of their s3 buckets based on the provided pipeline root, either via the default pipeline root, or one specified during pipeline execution. When using "fromEnv: true" the onus is on the user, as well as admin (that manages kfp) to ensure that the user pod and kfp ui have these credentials available. Following updates to the UI went in here:

Changes to client code:

  • fetch artifact property "store_session_info", from mlmd (now accessible via LinkedArtifact), and including this as part of the retrieval request for artifacts against the server
  • in some cases namespaces were not being added to the request for artifact retrieval, this has been amended

Changes to front end server:

  • if a store session info is detected, a long with a namespace, and we detect that fromenv=false then we require the session info to include the location of the credentials (k8s secret) and utilize those. If namespace is not provided, we reject the request.
  • I also removed some outdated legacy code related to how we used to retrieve aws credentials from the environment
  • added tests, and fixed some broken tests (in RunLists)

Notes:

  • The credential secrets are required to be in the user profile namespace, though this seems to have already been the case before [1], [2], [3]

  • Because we don't do namespace verification server side, there continues to be a security concern where users can dupe the namespace client side and access object stores from another profile namespace. Because we are storing the provider's creds location in mlmd and querying it from client side, client can technically dupe this, and attempt to read object stores if their secrets are stored else where on the cluster. However, once we resolve [backend] Security exploit in mlpipeline-UI #9889 this should no longer be an issue, because any attempt to access resources outside the user's profile namespace should be rejected.

  • With some additional changes, we can allow users to set another default object store, and resolve [feature] Allow manifests to set external Object Store to deploy Kubeflow Pipelines #10651. The additional change will require kfp to read a default kfp launcher from the kubeflow namespace (or wherever kfp is deployed).

Once accepted, I will follow up with documentation on this.

configmap
apiVersion: v1
data:
  defaultPipelineRoot: s3://mlpipeline
  providers: |-
    s3:
      default:
        endpoint: http://your-other-minio-endpoint.com
        disableSSL: true
        region: minio
        credentials:
          fromEnv: false
          secretRef:
            secretName: ds-pipeline-s3-sample
            accessKeyKey: accesskey
            secretKeyKey: secretkey
      overrides:
        # use pipeline root: s3://your-bucket/some/s3/path/a/c
        - bucketName: your-bucket
          keyPrefix: some/s3/path/a/c
          endpoint: s3.amazonaws.com
          region: us-east-2
          credentials:
            fromEnv: false
            secretRef:
              secretName: aws-s3-creds
              accessKeyKey: AWS_ACCESS_KEY_ID
              secretKeyKey: AWS_SECRET_ACCESS_KEY
        # use pipeline root: s3://test-scheme/minioAsS3/test/route
        - endpoint: http://your-other-minio-endpoint.com
          region: minio
          disableSSL: true
          bucketName: test-scheme
          keyPrefix: minioAsS3/test/route
          credentials:
            fromEnv: false
            secretRef:
              secretName: minio-creds
              accessKeyKey: accesskey
              secretKeyKey: secretkey
kind: ConfigMap
metadata:
  name: kfp-launcher
  • run a pipeline, select a pipeline root that matches one from the kfp-launcher config (ensure this config is in the user's namespace, as well as the required credentials)
  • confirm artifacts are stored in the matching bucket and path
  • verify that the artifacts are previewed correctly in the kfp UI
    • you want to verify artifacts in artifact info, metrics visualizations, and run compares (try different visualizations)

Testing Instructions

To test this PR you will need to:

  1. Build the images
  • Build the KFP front end image
  • Build the driver image
  • Build the launcher image

You can also use these images here:

  • quay.io/hukhan/kfp-frontend:issue-10787
  • quay.io/hukhan/kfp-launcher:issue-10787
  • quay.io/hukhan/kfp-driver:issue-10787
  1. Configure kfp deployment to use these images

Frontend
quay.io/hukhan/kfp-frontend:issue-10787
substitute this in the frontend deployment:

Driver/launcher:
In the KFP API Server deployment, add the environment variables:
V2_LAUNCHER_IMAGE=quay.io/hukhan/kfp-launcher:issue-10787
V2_DRIVER_IMAGE=quay.io/hukhan/kfp-driver:issue-10787

  1. Deploy KFP, I recommend using a full kfp deployment so you can test with a multi-user setup.

  2. Create a kfp-launcher config in the user namespace that lists your S3/GCS/Minio credentials, for the formatting see the instructions in this PR..

  • test with gcs token creds
  • test with aws s3 creds provided via secret
  • test with aws irsa (fromEnv in kfp launcher config)
  • test with minio
  1. Any secret you mention in kfp-launcher also needs to be added to the user's profile namespace.

  2. Upload a pipeline that outputs artifacts

Some pipleines to test with:

  1. Run a pipeline, select a pipeline root that matches one from the kfp-launcher config (once again the kfp-launcher config and the secret to utilize should be in the user's profile namespace)

  2. test their visuals in the UI, key areas to test:

  • Run details Artifact Info tab
  • Run details visualization tab
  • Compare Runs, test multiple visualizations
  • Confirm s3 artifacts are stored in expected locations within the respective object stores

Checklist:

@HumairAK
Copy link
Collaborator Author

HumairAK commented May 3, 2024

I'll take a look at the failing tests
As well as follow up with a demo, and some images folks can test this pr with (currently blocked on building the ui image due to this issue).

@HumairAK
Copy link
Collaborator Author

HumairAK commented May 9, 2024

/test kubeflow-pipeline-frontend-test

1 similar comment
@HumairAK
Copy link
Collaborator Author

HumairAK commented May 9, 2024

/test kubeflow-pipeline-frontend-test

@HumairAK
Copy link
Collaborator Author

HumairAK commented May 9, 2024

/test kubeflow-pipeline-frontend-test

Flakey tests seem to be failing I think, retrying a couple of times to confirm (I'm unable to reproduce this in my environment).

Edit: yes that seemed to be the case, they are passing now.

@juliusvonkohout
Copy link
Member

@HumairAK feel free to ping me on slack or join the security meeting if you are interested in fixing exploit mentioned above.

@juliusvonkohout
Copy link
Member

And I am also investigating Apache ozone as a minio replacement.

@HumairAK
Copy link
Collaborator Author

@juliusvonkohout yes this is something we 100% need to fix

I'm currently limited on bandwidth, but if in a couple of weeks if this is not picked up by someone else I can take it on

@HumairAK
Copy link
Collaborator Author

Rebased, and I've updated with detailed testing instructions and also provided already built images from the pr images in order to help speed up review on this.

Comment on lines +44 to +46
* If providerInfo is not provided or, if credentials are sourced fromEnv,
* then, if using aws s3 (via provider chain or instance profile), create a
* minio client backed by aws s3 client.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minio is the default deployment approach, Can we make sure this continues to be the case?

Copy link
Collaborator Author

@HumairAK HumairAK Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zijianjoy yes, the above description describes the aws case, if aws is not specified, we default to minio when using default deployment manifests (since that is what is configured in the kfp ui environment), see description below this paragraph

@zijianjoy
Copy link
Collaborator

/lgtm

Approved from the frontend side. CC @chensun for the backend change in this PR.

@HumairAK
Copy link
Collaborator Author

cc @chensun

@chensun chensun changed the title feat(frontend): Add UI support for object store customization and prefixes feat(frontend&backend): Add UI support for object store customization and prefixes Jun 18, 2024
Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks!

@chensun
Copy link
Member

chensun commented Jun 18, 2024

/retest-required

Note that for runlist, the mock error response only returns one valid
run list with error set, the other is undefined, so to support multiple
runIds the mock error response will need to be adjusted.

Signed-off-by: Humair Khan <[email protected]>
..and clean up imports.

Signed-off-by: Humair Khan <[email protected]>
Signed-off-by: Humair Khan <[email protected]>
@HumairAK
Copy link
Collaborator Author

HumairAK commented Jun 20, 2024

rebased to pick up the new github ci tests

@HumairAK
Copy link
Collaborator Author

/test kubeflow-pipeline-e2e-test

Copy link

@HumairAK: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test kfp-kubernetes-test-python310
  • /test kfp-kubernetes-test-python311
  • /test kfp-kubernetes-test-python312
  • /test kfp-kubernetes-test-python38
  • /test kfp-kubernetes-test-python39
  • /test kubeflow-pipeline-frontend-test
  • /test kubeflow-pipeline-mkp-snapshot-test
  • /test kubeflow-pipeline-mkp-test
  • /test kubeflow-pipelines-backend-visualization
  • /test kubeflow-pipelines-component-yaml
  • /test kubeflow-pipelines-components-google-cloud-python38
  • /test kubeflow-pipelines-integration-v2
  • /test kubeflow-pipelines-manifests
  • /test kubeflow-pipelines-sdk-docformatter
  • /test kubeflow-pipelines-sdk-execution-tests
  • /test kubeflow-pipelines-sdk-isort
  • /test kubeflow-pipelines-sdk-python310
  • /test kubeflow-pipelines-sdk-python311
  • /test kubeflow-pipelines-sdk-python312
  • /test kubeflow-pipelines-sdk-python38
  • /test kubeflow-pipelines-sdk-python39
  • /test kubeflow-pipelines-sdk-yapf
  • /test test-kfp-runtime-code-python310
  • /test test-kfp-runtime-code-python311
  • /test test-kfp-runtime-code-python312
  • /test test-kfp-runtime-code-python38
  • /test test-kfp-runtime-code-python39
  • /test test-run-all-gcpc-modules
  • /test test-upgrade-kfp-sdk

The following commands are available to trigger optional jobs:

  • /test kfp-kubernetes-execution-tests
  • /test kubeflow-pipeline-upgrade-test
  • /test kubeflow-pipeline-upgrade-test-v2
  • /test kubeflow-pipelines-samples-v2

Use /test all to run the following jobs that were automatically triggered:

  • kubeflow-pipeline-frontend-test
  • kubeflow-pipeline-upgrade-test
  • kubeflow-pipeline-upgrade-test-v2
  • kubeflow-pipelines-samples-v2

In response to this:

/test kubeflow-pipeline-e2e-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@HumairAK: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipeline-e2e-test 8a59da1 link false /test kubeflow-pipeline-e2e-test
kubeflow-pipeline-upgrade-test e0ff01d link false /test kubeflow-pipeline-upgrade-test
kubeflow-pipelines-samples-v2 e0ff01d link false /test kubeflow-pipelines-samples-v2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@HumairAK
Copy link
Collaborator Author

@zijianjoy / @chensun all e2e tests are also passing, no new code changes have been introduced since the rebase, can we get a refresh on the "lgtm" and get this merged?

@zijianjoy
Copy link
Collaborator

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Jun 25, 2024
@google-oss-prow google-oss-prow bot merged commit 6723d3d into kubeflow:master Jun 25, 2024
9 of 13 checks passed
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 25, 2024

@HumairAK where exactly did you fix #9889 ?

I see only "Because we don't do namespace verification server side, #9889 where users can dupe the namespace client side and access object stores from another profile namespace. Because we are storing the provider's creds location in mlmd and querying it from client side, client can technically dupe this, and attempt to read object stores if their secrets are stored else where on the cluster. However, once we resolve #9889 this should no longer be an issue, because any attempt to access resources outside the user's profile namespace should be rejected." which says the opposite.

@HumairAK
Copy link
Collaborator Author

@juliusvonkohout this pr did not address #9889, in this post I merely made note of it. As this is a separate issue it is out of scope of this PR (it was already very long). Though I agree we need to prioritize this, however we should do it as a separate effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants