-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): caching may now be disabled when starting pipeline runs #8177
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Hi @TobiasGoerke. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Does this correspond to your ideas, @chensun and @zijianjoy? |
I discussed with @TobiasGoerke that we go for a disable cache switch in the pipeline run UI and a slightly modified version of #7939 (comment) . If the maximum cache staleness leads to an empty list from
So TWO environment variables for the cache-server that provide a default (if the pipeline does not have values) AND maximum (larger values in the pipeline are ignored) cache staleness. Then an administrator can set the expiration date on his Minio/S3/GCS storage backend to the same value as the maximum cache staleness and provide a sensible staleness default value for its users pipelines. We limit the user-set value int he pipeline definition to the maximum value from the administrator. The users also do not need to recompile existing pipelines anymore because they can disable the cache from the UI. I think setting the exact cache duration from the UI is overkill and a disable/enable switch is enough. I think this covers most usecases, is independent of the storage backend (compared to #7938) and rather easy to implement. This is also very much in line what google wants according to the comments and the KFP community meeting. |
/ok-to-test |
As discussed with @zijianjoy (see google docs comments), disabling the cache for new runs / jobs is now implemented in the backend. The UI hasn't changed and still features a switch for controlling the cache. Frontend and backend changes are implemented for both V1 and V2. Feel free to review and test them. Note: There's a new field TODO: reviewing the |
/retest |
@TobiasGoerke can you rebase to master and fix the merge conflict? Please try my new cache server at mtr.devops.telekom.de/ai/cache-server:latest with the environment variables MAXIMUM_CACHE_STALENESS=P0DT0H3M30S DEFAULT_CACHE_STALENESS=P0DT0H1M31S |
I've tested your changes and they work well for me. The cache is invalidated after a couple of minutes. To reproduce and install this PR locally, I've pushed the ml-pipeline images to dockerhub. images:
- name: gcr.io/ml-pipeline/cache-server
newName: mtr.devops.telekom.de/ai/cache-server
newTag: latest
- name: gcr.io/ml-pipeline/frontend
newName: tobiasgoerke/ml-pipeline-frontend
- name: gcr.io/ml-pipeline/api-server
newName: tobiasgoerke/ml-pipeline-api-server
patchesStrategicMerge:
- |
apiVersion: apps/v1
kind: Deployment
metadata:
name: cache-server
namespace: kubeflow
spec:
template:
spec:
containers:
- name: server
env:
- name: MAXIMUM_CACHE_STALENESS
value: P0DT0H3M30S
- name: DEFAULT_CACHE_STALENESS
value: P0DT0H1M31S |
/test kubeflow-pipeline-frontend-test |
APi changes are to be discussed with @Linchin |
As discussed in todays KFP meeting I will break off the cacheserver stuff into another very small PR such that @chensun can review and merge it independently. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…arting pipeline runs. Added a new flag 'disable_cache' to run and job protos in API backend. Backend handles this flag and sets caching annotations accordingly. Frontend enables the user to set the flag by offering a switch button. Squashed previous commits
As discussed in #8104 and the subsequent community meeting on 17. August, Kubeflow requires a way to manage its artifacts and cache.
We've agreed on proceeding and implementing suitable mechanisms step by step.
This first feature adds a switch that lets users disable or enable caching for new pipelines.
Implementation notes: