[Multi User] Support separate metadata for each namespace #4790

Jeffwan · 2020-11-20T00:34:09Z

Part of #1223, since we close it, we need a separate issue to track this feature.

Support separate metadata for each namespace help us only see related artifact/executations.

Currently, MLMD doesn't have user/namespace concept to isolate metadata based on user. An workaround we can move forward is to aggregate artifacts/executations by existing experiment and runs in user's namespace. This will end up some MLMD queries and I am not sure how's the performance especially in large scale.

Thumbs up if this is something you need.

/kind feature

Jeffwan · 2020-11-20T00:35:25Z

I didn't find existing issue to track this story. If there's one, please let me know

numerology · 2020-11-20T00:56:00Z

I remember there's a very related one in TFX repo: tensorflow/tfx#2618

I am assuming this one is talking about supporting multi-tenancy through k8s-native way -- namespace while that one is more about built-in multi-tenancy support in MLMD itself.

Jeffwan · 2020-11-20T05:12:05Z

@numerology Yeah, If MLMD can add support for multi-tenancy. that would be great. Pipeline project can make corresponding changes.

I am assuming this one is talking about supporting multi-tenancy through k8s-native way -- namespace while that one is more about built-in multi-tenancy support in MLMD itself.

Yeah, that's true. If MLMD doesn't have plan to support it, we can still have the workaround to aggregate metadata at the namespace level

arllanos · 2021-05-31T21:28:55Z

@Jeffwan @numerology @Bobgy

Let me mention some points I think can be considered along with this issue related to artifacts list page. I did not check executions page yet.

Issue 1: data retrieval in ArtifactsList.tsx hangs with big number of artifacts and can be optimized. Following endpoint call is not necessary (seems this was added when ml_metadata was not returning the creation-time of the artifact):

pipelines/frontend/src/pages/ArtifactList.tsx

Line 211 in 4212110

creationTime: await getArtifactCreationTime(artifactId, this.api.metadataStoreService),

We implemented this optimization internally at PwC and the page does not hang anymore even with big number of artifacts.

If we put our optimization upstream, there is a couple of options:

Option 1 Do nothing else, that is, keep pagination disabled and keep filtering and sorting client-side until mlmd supports server side filtering (with predicates) and sorting.
Drawback is that although artifacts will be rendered, it can be slow depending on number of artifacts in mlmd. Tested with 35000 artifacts in mlmd took ~50 secs to load all artifacts. Filter/sort ~15 secs.

Option 2 Enable pagination by using server-side feature in mlmd. Pagination is currently available in mlmd. We tried this internally and paginated data is rendered almost immediately.
However, sort/filter in the frontend client will act on data fetched in memory but not all data is in memory, only the portion of it that corresponds to the current pagination.
A solution so ArtifactList uses pagination, sorting and filtering in mlmd side, I think is blocked until mlmd supports filtering with predicates and more flexible sorting (sorting in mlmd is limited to creation-time/update-time/id)
Issue 2: There is a ParentContext feature available in mlmd v1.0.0. Have you checked into this? Maybe can be used to achieve separation per namespace.

I'll appreciate your thoughts and comments.

CC/ @maganaluis

Bobgy · 2021-08-23T06:10:46Z

Server side filtering is available in ml-metadata 1.2.0!!
https://github.com/google/ml-metadata/blob/839308f502f97299ce5ee02852ca86e702211386/ml_metadata/proto/metadata_store.proto#L852

Bobgy · 2021-08-23T06:12:46Z

For anyone interested, for your namespace separation requirements, do you want metadata DB to be

one instance per namespace
shared instance per cluster and use namespace context to filter stuff in a certain namespace

WIth 1, we can build access control using Istio Authorization.
With 2, IIUC, Istio Authorization needs to parse the requests and understand which namespace it's querying on. That's probably not possible right now given the requests are in gRPC not HTTP.

juliusvonkohout · 2022-01-20T09:18:20Z

@Bobgy is there any progress or decision made on this issue?

juliusvonkohout · 2022-04-12T17:01:32Z

For anyone interested, for your namespace separation requirements, do you want metadata DB to be
1. one instance per namespace

2. shared instance per cluster and use namespace context to filter stuff in a certain namespace
WIth 1, we can build access control using Istio Authorization. With 2, IIUC, Istio Authorization needs to parse the requests and understand which namespace it's querying on. That's probably not possible right now given the requests are in gRPC not HTTP.

@chensun @zijianjoy

maybe we should use a proxy as it is done for katib-mysql and katib-db-manager.
google/ml-metadata#141 suggests that we just have to add a namespace/profile/user column and filter by it

juliusvonkohout · 2022-05-05T13:18:10Z

@Bobgy @zijianjoy istio should support grpc filtering now istio/istio#25193 (comment)

@ca-scribner would you be interested to implement this envoy filter? I am still busy with the minio stuff.

zijianjoy · 2023-07-19T16:11:16Z

Following up on this item:

I am leaning towards creating one MLMD instance per namespace. This is because we should consider the data lifecycle of MLMD information. When a namespace is deleted, we should have an approach to easily clean up data related to this namespace. This might not be easy with MLMD nowadays using single MLMD instance, because delete operation is not supported by design: google/ml-metadata#38. Thus starting the separation from the beginning is my current preference.

That said, I am aware that one MLMD instance per namespace probably mean resource usage overhead for a cluster with many namespaces. So we should consider using something like pod autoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/. This problem we are facing is similar to the artifact-ui scalability problem as well: #9555

juliusvonkohout · 2023-07-20T09:19:18Z

@zijianjoy amazing, that this is tackled. We need to have this anyway as CNCF graduation requirement. The CNCF will do a security assessment and this is a clear security violation.

I think the artifact per namespace visualization server should be removed anyway, since it is deprecated and the artifact proxy is obsolete as well as explained here #9555 (comment).

That means currently you can already have zero overhead namespaces if you drop old garbage. I know of Kubeflow installations with several hundred namespaces, so that is a real problem customers are facing. I can create a PR to make that the default and fix the security issue i have found a few years ago with code from @thesuperzapper #8406 (comment)

In the long term i would propose switching to MLFlow, since that seems to be the industry standard, but if that is not possible due to google policies we should consider something with minimal footpprint. Maybe knative serverless per namespace
Nevertheless i still prefer a single MLMD instance for the time being to keep supporting zero overhead kubeflow namespaces and find a proper solution for the long-term, so not MLMD.

rimolive · 2023-11-08T16:05:28Z

Following up on this item:

I am leaning towards creating one MLMD instance per namespace. This is because we should consider the data lifecycle of MLMD information. When a namespace is deleted, we should have an approach to easily clean up data related to this namespace. This might not be easy with MLMD nowadays using single MLMD instance, because delete operation is not supported by design: google/ml-metadata#38. Thus starting the separation from the beginning is my current preference.

That said, I am aware that one MLMD instance per namespace probably mean resource usage overhead for a cluster with many namespaces. So we should consider using something like pod autoscaler: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/. This problem we are facing is similar to the artifact-ui scalability problem as well: #9555

One MLMD instance per namespace is bad for a Governance aspect. What if I want to track all assets produced by the company, like a catalog? This would require querying multiple MLMD API Servers. There should be a way to prevent unwanted access through Istio, thus creating a solution that does not depend on MLMD developers to implement.

github-actions · 2024-06-24T07:42:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

juliusvonkohout · 2024-06-24T08:18:10Z

/lifecycle frozen

k8s-ci-robot added the kind/feature label Nov 20, 2020

Bobgy added the upstream_issue label Dec 14, 2020

josete89 mentioned this issue Apr 14, 2021

[Multi User] Support separate artifact repository for each namespace #4649

Open

juliusvonkohout mentioned this issue Apr 1, 2022

support separate pipeline for each namespace #4197

Open

juliusvonkohout mentioned this issue Apr 12, 2022

Kubeflow component integration with ML Metadata kubeflow/community#783

Open

This was referenced Jun 29, 2022

Update jupyter-web-app with rbac.authorization.k8s.io/v1 kubeflow/manifests#2035

Closed

feat(backend): authorize readArtifacts and ReportMetrics endpoints #7819

Merged

juliusvonkohout mentioned this issue Aug 10, 2022

[feature] Make managing the artifact storage and cache available in the UI #8104

Closed

juliusvonkohout mentioned this issue Nov 3, 2022

Kubeflow 1.9 Enterprise grade Agenda kubeflow/kubeflow#6662

Closed

juliusvonkohout mentioned this issue Aug 17, 2023

[backend] Security exploit in mlpipeline-UI #9889

Open

juliusvonkohout mentioned this issue Nov 21, 2023

Model Registry for Kubeflow based on ml-metadata kubeflow/kubeflow#7396

Closed

juliusvonkohout mentioned this issue May 24, 2024

Kubeflow 1.10 Enterprise grade Agenda kubeflow/manifests#2727

Closed

7 tasks

github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2024

google-oss-prow bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2024

google-oss-prow bot added the lifecycle/frozen label Jun 24, 2024

juliusvonkohout mentioned this issue Oct 31, 2024

Kubeflow Platform (Manifests & Security WG) roadmap for KF 1.10 kubeflow/manifests#2763

Open

38 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Multi User] Support separate metadata for each namespace #4790

[Multi User] Support separate metadata for each namespace #4790

Jeffwan commented Nov 20, 2020 •

edited

Loading

Jeffwan commented Nov 20, 2020

numerology commented Nov 20, 2020

Jeffwan commented Nov 20, 2020

arllanos commented May 31, 2021

Bobgy commented Aug 23, 2021

Bobgy commented Aug 23, 2021 •

edited

Loading

juliusvonkohout commented Jan 20, 2022

juliusvonkohout commented Apr 12, 2022 •

edited

Loading

juliusvonkohout commented May 5, 2022 •

edited

Loading

zijianjoy commented Jul 19, 2023 •

edited

Loading

juliusvonkohout commented Jul 20, 2023

rimolive commented Nov 8, 2023

github-actions bot commented Jun 24, 2024

juliusvonkohout commented Jun 24, 2024

[Multi User] Support separate metadata for each namespace #4790

[Multi User] Support separate metadata for each namespace #4790

Comments

Jeffwan commented Nov 20, 2020 • edited Loading

Jeffwan commented Nov 20, 2020

numerology commented Nov 20, 2020

Jeffwan commented Nov 20, 2020

arllanos commented May 31, 2021

Bobgy commented Aug 23, 2021

Bobgy commented Aug 23, 2021 • edited Loading

juliusvonkohout commented Jan 20, 2022

juliusvonkohout commented Apr 12, 2022 • edited Loading

juliusvonkohout commented May 5, 2022 • edited Loading

zijianjoy commented Jul 19, 2023 • edited Loading

juliusvonkohout commented Jul 20, 2023

rimolive commented Nov 8, 2023

github-actions bot commented Jun 24, 2024

juliusvonkohout commented Jun 24, 2024

Jeffwan commented Nov 20, 2020 •

edited

Loading

Bobgy commented Aug 23, 2021 •

edited

Loading

juliusvonkohout commented Apr 12, 2022 •

edited

Loading

juliusvonkohout commented May 5, 2022 •

edited

Loading

zijianjoy commented Jul 19, 2023 •

edited

Loading