-
Notifications
You must be signed in to change notification settings - Fork 138
Upgrades in 1.1 should follow kustomize off the shelf workflow #304
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
As noted in kubeflow/kubeflow#4873; kustomize commonLabels should only be used for immutable labels As these are used in selectors which are immutable. Right now our applications are using version in the version and instance label which is used in selector and set via commonLabels. We need to fix this so that labels will be immutable across version updates. It looks like
So if we have appropriate, immutable, labels for each application then we should be able to use |
* Fix kubeflow#1131 * kustomize commonLabels get subsituted into selector fields. Selector fields are immutable. So if commonLabels change (e.g. between versions) then we can't reapply/update the existing resources which breaks upgrades (kubeflow/kfctl#304) * For the most part the problematic commonLabels were on our Application resources. The following labels were being set "app.kubernetes.io/version" "app.kubernetes.io/instance" "app.kubernetes.io/managed-by" "app.kubernetes.io/part-of" * Version was definetely changing between versions. instance was also changing between versions to include the version number. * managed-by and part-of could also change (e.g. we may not be using kfctl) * We could still set these labels if we wanted to; we just shouldn't set them as commonLabels and/or include them in the selector as the will inhibit upgrades with kubectl apply. * I created a test validate_resources_test.go to ensure none of these labels are included in commonLabels * I created a simple go binary tools/fix_common_labels.go to update all the resources. * generat_tests.py - Delete the code to remove unmatched tests. * We no longer generate tests that way and the delete code was going to delete valid tests like our new validation test * Get rid of the clean rule in the Makefile for the same reason.
* Fix #1131 * kustomize commonLabels get subsituted into selector fields. Selector fields are immutable. So if commonLabels change (e.g. between versions) then we can't reapply/update the existing resources which breaks upgrades (kubeflow/kfctl#304) * For the most part the problematic commonLabels were on our Application resources. The following labels were being set "app.kubernetes.io/version" "app.kubernetes.io/instance" "app.kubernetes.io/managed-by" "app.kubernetes.io/part-of" * Version was definetely changing between versions. instance was also changing between versions to include the version number. * managed-by and part-of could also change (e.g. we may not be using kfctl) * We could still set these labels if we wanted to; we just shouldn't set them as commonLabels and/or include them in the selector as the will inhibit upgrades with kubectl apply. * I created a test validate_resources_test.go to ensure none of these labels are included in commonLabels * I created a simple go binary tools/fix_common_labels.go to update all the resources. * generat_tests.py - Delete the code to remove unmatched tests. * We no longer generate tests that way and the delete code was going to delete valid tests like our new validation test * Get rid of the clean rule in the Makefile for the same reason.
@jlewi Hey Jeremy - will this feature be included in Kubeflow 1.1 ? |
* This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
* This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
* This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per #304 we want to follow the off shelf workflow.
…low#351) * This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
…low#351) * This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
…low#351) * This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
…low#351) * This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
…low#351) * This is GCP specific code that allows CloudEndpoints to be created using the CloudEndpoint controller. A Cloud endpoint is a KRM style resource so we can kust have `kfctl apply -f {path}` invoke the appropriate logic. * For GCP this addresses GoogleCloudPlatform/kubeflow-distribution#36; specifically when deploying private GKE the CloudEndpoints controller won't be able to contact the servicemanagement API. This provides a work around by running it locally. * This pattern seems extensible; i.e. other platforms could link in code to handle CR's specific to their platforms. This could basically be an alternative to plugins. * I added a context flag to control the kubecontext that apply applies to. Unfortunately, it doesn't look like there is an easy way to use that in the context of applying KFDef. It looks like the current logic assumes the cluster will be added to the KFDef metadata and then look up that cluster in .kubeconfig. * Modifying that logic to support the context flag seemed riskier then simply adding a comment to the flag. * Added some warnings that KFUpgrade is deprecated since per kubeflow#304 we want to follow the off shelf workflow.
Per Yuan, I deleted - * Process and tools for upgrades from Release N-1 to N i.e. 1.0.x to 1.1, [kubeflow#304](kubeflow/kfctl#304) Per James, I added - * Manage recurring Runs via new “Jobs” page (exact name on UI is TBD)
Per Yuan, I deleted - * Process and tools for upgrades from Release N-1 to N i.e. 1.0.x to 1.1, [kubeflow#304](kubeflow/kfctl#304) Per James, I added - * Manage recurring Runs via new “Jobs” page (exact name on UI is TBD)
* Update ROADMAP.md I updated Kubeflow 1.1, added Kubeflow 1.2 and Kubelfow 1.3 roadmap items. * Update ROADMAP.md Improved wording of features to simplify understanding * Update ROADMAP.md Added details on KFServing 0.5 enhancements * Update ROADMAP.md updated the notebooks section in Kubeflow 1.3 with these modificiations, * Notebooks * Important backend updates to Notebooks (i.e. to improve interop with Tensorboard) * New and expanded Jupyter Notebook stack along with easy to customize common base images * Addition of R-Studio and Code-Server (VS-Code) support * Update ROADMAP.md Reorganized Working Group updates into 1st section. added that customizing jupyter base image is a stretch feature * Update ROADMAP.md Per Yuan, I deleted - * Process and tools for upgrades from Release N-1 to N i.e. 1.0.x to 1.1, [#304](kubeflow/kfctl#304) Per James, I added - * Manage recurring Runs via new “Jobs” page (exact name on UI is TBD) * Update ROADMAP.md Added Multi-Model Serving, https://github.com/yuzliu/kfserving/blob/master/docs/MULTIMODELSERVING_GUIDE.md to KFServing 0.5 roadmap items
* Update ROADMAP.md I updated Kubeflow 1.1, added Kubeflow 1.2 and Kubelfow 1.3 roadmap items. * Update ROADMAP.md Improved wording of features to simplify understanding * Update ROADMAP.md Added details on KFServing 0.5 enhancements * Update ROADMAP.md updated the notebooks section in Kubeflow 1.3 with these modificiations, * Notebooks * Important backend updates to Notebooks (i.e. to improve interop with Tensorboard) * New and expanded Jupyter Notebook stack along with easy to customize common base images * Addition of R-Studio and Code-Server (VS-Code) support * Update ROADMAP.md Reorganized Working Group updates into 1st section. added that customizing jupyter base image is a stretch feature * Update ROADMAP.md Per Yuan, I deleted - * Process and tools for upgrades from Release N-1 to N i.e. 1.0.x to 1.1, [kubeflow#304](kubeflow/kfctl#304) Per James, I added - * Manage recurring Runs via new “Jobs” page (exact name on UI is TBD) * Update ROADMAP.md Added Multi-Model Serving, https://github.com/yuzliu/kfserving/blob/master/docs/MULTIMODELSERVING_GUIDE.md to KFServing 0.5 roadmap items
* Update ROADMAP.md I updated Kubeflow 1.1, added Kubeflow 1.2 and Kubelfow 1.3 roadmap items. * Update ROADMAP.md Improved wording of features to simplify understanding * Update ROADMAP.md Added details on KFServing 0.5 enhancements * Update ROADMAP.md updated the notebooks section in Kubeflow 1.3 with these modificiations, * Notebooks * Important backend updates to Notebooks (i.e. to improve interop with Tensorboard) * New and expanded Jupyter Notebook stack along with easy to customize common base images * Addition of R-Studio and Code-Server (VS-Code) support * Update ROADMAP.md Reorganized Working Group updates into 1st section. added that customizing jupyter base image is a stretch feature * Update ROADMAP.md Per Yuan, I deleted - * Process and tools for upgrades from Release N-1 to N i.e. 1.0.x to 1.1, [kubeflow#304](kubeflow/kfctl#304) Per James, I added - * Manage recurring Runs via new “Jobs” page (exact name on UI is TBD) * Update ROADMAP.md Added Multi-Model Serving, https://github.com/yuzliu/kfserving/blob/master/docs/MULTIMODELSERVING_GUIDE.md to KFServing 0.5 roadmap items
Filing this issue to track simplifying the upgrade process in Kubeflow 1.1.
Here's the current instructions for how Kubeflow upgrades are done.
https://www.kubeflow.org/docs/upgrading/upgrade/
This differs from the standard off the shelf workflow for kustomize applications
https://github.com/kubernetes-sigs/kustomize/blob/master/docs/workflows.md#off-the-shelf-configuration
In particular, we introduce a KFUpgrade resource which defines pointers to the old and new KFDef.
https://www.kubeflow.org/docs/upgrading/upgrade/#upgrade-instructions
kfctl then does a lot of a magic in order to try to reapply any user defined kustomizations ontop of the new configs.
With the new kustomize patterns (http://bit.ly/kf_kustomize_v3) we should be able to simplify this and I think eliminate the need for kfctl. Instead users should be able to just
This is because the new pattern with stacks is that kfctl generates a new kustomize package using Kubeflow defined packages in .cache as the base. So a user can regenerate .cache without losing any of their kustomizations.
There are a couple of issues that we run into when applying the updated manifests
apply
is called.Rather than rely on kfctl logic to solve these problems we should follow a shift left pattern. Our expectation should be that we rely on existing tools (e.g. kubectl, kpt, etc...) to apply the manifests and handle these problems.
kpt for example supports pruning
/cc @richardsliu @yanniszark @kunmingg
The text was updated successfully, but these errors were encountered: