Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): authorize readArtifacts and ReportMetrics endpoints #7819

Merged
merged 7 commits into from
Aug 4, 2022

Conversation

difince
Copy link
Member

@difince difince commented Jun 1, 2022

Secure communication between the API server and the persistent agent.
Currently, partially secured.
New Verbs (reportMetrics and readArtifact) are added to ClusterRole with name: aggregate-to-kubeflow-pipelines-edit

Fixes: #7818

Signed-off-by: Diana Atanasova [email protected]

Description of your changes:

Checklist:

@difince
Copy link
Member Author

difince commented Jun 1, 2022

/assign @juliusvonkohout

@difince
Copy link
Member Author

difince commented Jun 1, 2022

/assign @difince
/cc @juliusvonkohout

@difince
Copy link
Member Author

difince commented Jun 1, 2022

/unassign @juliusvonkohout

@difince
Copy link
Member Author

difince commented Jun 1, 2022

/cc @annajung

@google-oss-prow google-oss-prow bot requested a review from annajung June 1, 2022 14:24
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 1, 2022

Amazing work. I only have a few questions

  1. Did you check whether the persistence agent is authorized and can read the artifacts and report the metrics?
  2. Could you provide images such that I can test it on my cluster in an easy manner?
  3. The API all is reportRunMetrics the verb is ReportMetrics. Maybe we need more consistency here.
  4. Maybe the new verbs should be added to the viewer role too in addition to the edit role.

@difince
Copy link
Member Author

difince commented Jun 2, 2022

Hi @juliusvonkohout Thank you for your fast feedback!
I am not quite sure if I tested in the best way, so I will really appreciate it if you could help me validate.
In minutes I will create an image for you.

About the verb inconsistency - IMO, using the short verb name ReportMetrics instead of reportRunMetrics is ok, as the verb is used within a specific resource - in our case runs. But if you still think it would be better to be more precise I will change the name.

About 4. you make me think twice .. as ReadArticat is GET request - it makes sense to be part of the viewer role

@difince
Copy link
Member Author

difince commented Jun 2, 2022

@juliusvonkohout

  • image: docker pull atanasovad/api-server:1.1.10

  • use the image: kubectl edit deployment.apps/ml-pipeline -o yaml -n kubeflow

  • Manually apply the roles kubectl apply -f https://raw.githubusercontent.com/difince/pipelines/issue_7818/manifests/kustomize/base/installs/multi-user/view-edit-cluster-roles.yaml --namespace kubeflow

@difince
Copy link
Member Author

difince commented Jun 2, 2022

btw, I have noticed that ReportWorkflow and ReportScheduledWorkflow are also not authorized. Shouldn't they be?

@difince difince marked this pull request as draft June 3, 2022 11:34
@difince
Copy link
Member Author

difince commented Jun 3, 2022

I changed the PR to Draft because I had realized that in my case the persistent agent does Not call ReadArtifacts or ReportMetrics endpoints at all and that gave me the wrong feeling that all works.
The endpoints are not called because this check never passes.
It looks like the persistent agent won't be able to authorize
I will continue to investigate

In addition, something else I have noticed and guess is wrong behavior is that: on delete Run the artifacts stay in Minio? I will further investigate

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 3, 2022

I changed the PR to Draft because I had realized that the persistent agent does Not call ReadArtifacts or ReportMetrics endpoints at all and that gave me the wrong feeling that all works. The endpoints are not called because this check never passes.

* `metricsArtifactName` is a constant which value is "`mlpipeline-metrics`". this constant is compared with the real `artifactName` that comes from the workflow and is always different.

This check should only pass if your pipeline contains metrics. You need to test with something that contains metrics.

Here is something to test with and i can also provide you a YAML file if needed

def produce_markdown() -> NamedTuple('Outputs', [('mlpipeline_ui_metadata', 'UI_metadata'), ('mlpipeline_metrics', 'Metrics')]):
    #https://www.kubeflow.org/docs/components/pipelines/sdk/output-viewer/
    metadata = {"outputs": [
        {"storage": "inline", "source": "*this should be bold*", "type": "markdown"}
    ]}
    # https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/
    metrics = {'metrics': [
        {
        'name': 'train-accuracy', # underscores are not allowed... https://github.com/kubeflow/pipelines/issues/3809
        'numberValue':  0.9,
        },
        {
        'name': 'test-accuracy', # underscores are not allowed...
        'numberValue':  0.7,
        }
    ]}
    #return [json.dumps(markdown)]
    from collections import namedtuple; import json
    return namedtuple('output', ['mlpipeline_ui_metadata', 'mlpipeline_metrics'])(json.dumps(metadata), json.dumps(metrics))

An execution of this should produce metrics AND visualizations in the UI. If not there will be most likely errors in the logs of persistenceagent.

I forced the persistent agent to call ReadArtifacts and -> it looks it does not provide anything for authorization.

Yes exactly, that is why i was so curious how it worked without permission changes to persistenceagent.
Maybe https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode helps.
It was implemented here #5286 and a short overview is here #5138 (comment) . That overview details two approaches. Either we use the kubeflow-userid header or the serviceaccounttoken. Due to https://github.com/kubeflow/manifests/blob/5e9fedb195261cec2bea39349a85c1f7214cb6d0/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml#L52 we should start withe the simple approach, just faking the kubeflow-userid header. E.g. curl -H "kubeflow-userid: [email protected]" . This is how the UI handles it too. But how do we get the userid? We know the namespace e.g. test-user.
Then we just read the "owner" annotation from that namespace which is exactly the right kubeflow-userid header. There is already a Role and ClusterRole where we would need to just allow get namespace (not list, not watch, just get) in
https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/installs/multi-user/persistence-agent/cluster-role.yaml and https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/pipeline/ml-pipeline-persistenceagent-role.yaml

The token approach could be problematic, because we do not have the token from the users namespace, but please prove me wrong https://github.com/kubeflow/pipelines/blob/master/backend/src/apiserver/auth/.

In addition, something else I have noticed and guess is wrong behavior is that: on delete Run the artifacts stay in Minio? I will further investigate

No need to, this is known behavior. And before you do anything there i need to get my not yet polished fixes to the cache server merged. Otherwise deleting artifacts from minio just breaks pipeline execution. My version checks whether the cache database is at least partially consistent with minio and if not it deletes the database entry and prevents faulty caching.
Here is a preview of the cache-server fixes https://github.com/juliusvonkohout/pipelines/tree/patch-21. In the other namespace isolation pull request i also include an optional lifecycle policy for minio that deletes artifacts after some days.

@juliusvonkohout
Copy link
Member

You are right regarding "ReportWorkflow and ReportScheduledWorkflow should also be secured .."

@difince
Copy link
Member Author

difince commented Jun 3, 2022

Thank you @juliusvonkohout for your feedback! Your guidance is so valuable!
Will continue working on this next week!

@difince
Copy link
Member Author

difince commented Jun 16, 2022

/retest

@juliusvonkohout
Copy link
Member

Amazing, now just the same for the two reportworkflow calls and the API is secured.

I would like to present

#7819
#7938
#7725

in the next KFP meeting then and

discuss approaches for
#4790

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks!

@juliusvonkohout
Copy link
Member

@difince i guess there were some changes in master and you have to rebase.

New Verbs (reportMetrics and readArtifact) are added to ClusterRole with name: aggregate-to-kubeflow-pipelines-edit

Signed-off-by: Diana Atanasova <[email protected]>
Persistent Agent authorize itself based ot the namespace and the current user

Fixes: kubeflow#7818
Cover MULTIUSER=false usecase/Standalone pipeline installation.
In this case the namespace doesn't have `user` annotation and
there is no need to provide `kubeflow-userid` Header when making
a request against kfp-api-server

Signed-off-by: Diana Atanasova <[email protected]>
@difince
Copy link
Member Author

difince commented Jul 26, 2022

@chensun I have rebased .. and fixed conflicts in the license file

@juliusvonkohout
Copy link
Member

@chensun needs to add it /lgtm again, /approve seems to be still there. I do not have the rights to do so.

@@ -0,0 +1 @@
MULTIUSER=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a global config for this? Feels quite redundant to repeat this for all backend pods, especially when the file is already under a path with "multi-user".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a global config for this? Feels quite redundant to repeat this for all backend pods, especially when the file is already under a path with "multi-user".

We just copied the style that your team used so far in https://github.com/kubeflow/manifests/blob/746c523e9e994154df48d4441d01a30e4f9c8de4/apps/pipeline/upstream/base/installs/multi-user/api-service/params.env#L1 . We could consider adding it here https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/base/installs/multi-user/params.yaml if you want.

Copy link
Member Author

@difince difince Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chensun, @juliusvonkohout Currently with my addition, MULTIUSER is used by two services - app_server and persistent_agent. We could try to export them in a common params.env file but I would bring up a question about other env variables which are even more frequently used like - KUBEFLOW_USERID_HEADER, KUBEFLOW_USERID_PREFIX, NAMESPACE. Should we also export them?
IMO this is out of the scope of this PR. I was following the current way of work. If we want to do such an enhancement I would open a dedicated issue about it and I would be glad to work on it, but I will let this PR get merged as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it's out of scope for this PR.

@chensun
Copy link
Member

chensun commented Aug 4, 2022

/lgtm
/approve

Thanks @difince !

@google-oss-prow google-oss-prow bot added the lgtm label Aug 4, 2022
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 1caa8e0 into kubeflow:master Aug 4, 2022
jlyaoyuli pushed a commit to jlyaoyuli/pipelines that referenced this pull request Jan 5, 2023
…ubeflow#7819)

* Authorize readArtifacts and ReportMetrics endpoints

New Verbs (reportMetrics and readArtifact) are added to ClusterRole with name: aggregate-to-kubeflow-pipelines-edit

Signed-off-by: Diana Atanasova <[email protected]>

* Add authorization when Persistent Agent communicate with the api-server

Persistent Agent authorize itself based ot the namespace and the current user

Fixes: kubeflow#7818

* Update persistence_agent.csv license file

Signed-off-by: Diana Atanasova <[email protected]>

* Fix lexical error in persistent agent cluster role

Signed-off-by: Diana Atanasova <[email protected]>

* Fix integration tests/Fix MULTIUSER= false usecase

Cover MULTIUSER=false usecase/Standalone pipeline installation.
In this case the namespace doesn't have `user` annotation and
there is no need to provide `kubeflow-userid` Header when making
a request against kfp-api-server

Signed-off-by: Diana Atanasova <[email protected]>

* rebase: fix conflixt in license file

Signed-off-by: Diana Atanasova <[email protected]>

* rebase add new line in the end of licensing file

Signed-off-by: Diana Atanasova <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Add authorization to both ReadArtifacts and ReportMetrics endpoints
3 participants