Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Kubeflow 1.4 upgrade #1203

Closed
3 of 7 tasks
chuckbelisle opened this issue Jun 22, 2022 · 6 comments
Closed
3 of 7 tasks

[Epic] Kubeflow 1.4 upgrade #1203

chuckbelisle opened this issue Jun 22, 2022 · 6 comments
Assignees
Labels
area/engineering Requires attention from engineering: focus on foundational components or platform DevOps component/kubeflow Kubeflow Related kind/design kind/epic An epic kind/feature New feature or request triage/wont-fix This will not be worked on

Comments

@chuckbelisle
Copy link
Contributor

chuckbelisle commented Jun 22, 2022

This was closed and replaced by #1337

Ferme #681 #1094 ( see new 1.6 epic)

All tasks to do with the code will be added

Code Changes

Kubeflow - central dashboard

  • Base work of commit on Upgrade 1.3: Kubeflow #833 (comment)
  • I18n is going to be an issue. The fact that upstream changed direction to go with the Angular i18n instead of the one when they merged our PR. We need to make an decision as to which way we want to go. Wrong language, the central-dashboard is written in pug and that was not pushed upstream. This angular i18n is actually for jupyter-web-app.
  • Configmap will, as usual need to be double checked to use out items.
  • Do we want tensorboard?

Jupyter-web-apps

  • Image but about the overwriting of the logos
  • Is there any feature we need to add to our backend? The commit for 1.3 was StatCan/jupyter-apis@863a996 something similar will need to be done.

Volume web apps?

Since last time we seem to have ignored the volume-web-app in favor of our own customized volume table in jupyter-web-app, we do not need to do any customization. Unless we decide to revert the decision.

Pipelines

Manifest

Overview

This manifests folder exactly matches the upstream Kubeflow Manifests repository in its naming and folder hierarchy.

  • Kubeflow Manifests
  • Need to copy the manifest from the 1.4 branch of kubeflow (added tensorborad)

Note: We are pushing all of the work into the aaw-dev-cc-00 branch for aaw-kubeflow-manifests.

Post Deploy Tasks

  • Pipelines: Check the Cluster Roles are sufficient for Pipelines + Argo Workflow (Archive, Delete, Run, Experiments)
  • Profiles: Check the new Access Management KFAM works without our KFAM adjustments

Common

Component Local Manifests Path Upstream Issue AAW Sign-off CNS Sign-off Notes
kubeflow-namespace common/kubeflow-namespace v1.4.1 #198 No structural changes.
kubeflow-roles common/kubeflow-roles v1.4.1 #199 No structural changes.
oidc-authservice common/oidc-authservice v1.4.1 #200 No structural changes.
kubeflow-knative common/knative v1.4.1 #201 No structural changes.

I think anything that is not direct folder equivalent is in the knative folder

Apps

Component Local Manifests Path Upstream Issue AAW Sign-off CNS Sign-off Notes
admission-webhook apps/admission-webhook v1.4.1 #202 Was not in the list
central-dashboard apps/centraldashboard v1.4.1 #203 No structural changes.
jupyter-web-apps apps/jupyter-web-app v1.4.1 #204 Named jupyter upstream? Or equivalent to jupyter + volume + tensorboard
katib apps/katib v1.4.1 #210 No structural changes.
kfp-tekton v1.4.1 New
kfserving apps/kfserving v1.4.1 #211 No structural changes.
kubebench v1.4.1 New
mpi-job apps/mpi-job v1.4.1 #212 SAME, but is moved in 1.5.1 like the other *-job apps.
mxnet-job apps/mxnet-job v1.4.1 Changed Upstream see apps/training-operator
notebook-controller apps/notebook-controller v1.4.1 #216 Our custom controller?
pipeline apps/pipeline v1.4.1 #221 No structural changes.
profiles apps/profiles v1.4.1 #213 No structural changes.
pytorch-job apps/pytorch-job v1.4.1 Changed Upstream see apps/training-operator
tensorboard v1.4.1 New or different upstream - see jupyter-web-apps
tf-training apps/tf-training v1.4.1 Deleted or different upstream
training-operations v1.4.1 New! See Changed Upstream apps/training-operator
volume-web-apps v1.4.1 New or different upstream - see jupyter-web-apps

Contrib

Component Local Manifests Path Upstream Issue AAW Sign-off CNS Sign-off
spark-operator apps/spark-operator v1.4.1
seldon contrib/seldon v1.4.1

The following are in 1.4.1, and were also in 1.3.1 and we don't have them. maybe we don't use them. - TO confirm

  • application
  • basic-auth
  • dex-auth
  • experimental
  • feast
  • flink
  • gatekeeper
  • modeldb/base
  • spark - Same as spark-operator???
  • spartakus
  • tektoncd

Containers

We provide our own Kubeflow Notebooks that are updated continuously:

  • k8scc01covidacr.azurecr.io/rstudio:<sha>
  • k8scc01covidacr.azurecr.io/jupyterlab-cpu:<sha>
  • k8scc01covidacr.azurecr.io/jupyterlab-pytorch:<sha>
  • k8scc01covidacr.azurecr.io/jupyterlab-tensorflow:<sha>
  • k8scc01covidacr.azurecr.io/remote-desktop:<sha>

The following are the Kubeflow components we override in order to add features such as i18n and improved performance:

Container Kubeflow Component Comparison AAW Sign-off CNS Sign-off
oidc-authservice oidc-authservice [compare-oidc-authservice]
centraldashboard centraldashboard [compare-centraldashboard]
jupyter-apis jupyter-web-app [compare-jupyter-apis]
kubeflow-pipelines ml-pipeline/frontend [compare-kubeflow-pipelines]

Previous Epic

EPIC Kubeflow Upgrade Planning v1.3.1

Final Stretch: Core Upgrade Checkpoint for 1.4

In the interest of time, we will focus on upgrading the core components of Kubeflow.

We will finish upgrading these components to 1.4 first:

  • admission-webhook
  • notebook-controller
  • profiles
  • volume-web-apps

At the same time we'll need to do some heavy lifting for Jupyter Web Apps:

Note: send PRs to kf-1.4-upgrade

Finally, once those tickets are complete, we can ask @sylus to review and apply manifests to dev cluster.

Then... upgrade to 1.6

@chuckbelisle chuckbelisle added kind/feature New feature or request component/kubeflow Kubeflow Related area/engineering Requires attention from engineering: focus on foundational components or platform DevOps kind/design kind/epic An epic labels Jun 22, 2022
@sylus
Copy link
Member

sylus commented Jun 29, 2022

If you can please follow the exact method this was done for Kubeflow Upgrade to 1.3.x so it doesn't get out of hand like last time.

The more methodical approach worked really well ^_^

StatCan/aaw-kubeflow-manifests#110

@wg102
Copy link
Contributor

wg102 commented Jul 4, 2022

A useful link to compare the version for 1.4 is : https://www.kubeflow.org/docs/releases/kubeflow-1.4/

@sylus
Copy link
Member

sylus commented Jul 7, 2022

Will @rohank07 be helping with this since he knows how we rendered the manifests to check delta etc. ^_^

@rohank07
Copy link
Contributor

rohank07 commented Jul 7, 2022

I won't be actively on KF 1.4 upgrade. But if you want to view the output of the rendered manifest, the command in taskfile.yaml
task stack:aaw:preview should do the trick to view the output manifest and help with debugging.

@sylus
Copy link
Member

sylus commented Jul 7, 2022

Ideally someone that worked on it before with me would be active on it, bit confused about that.

Anyways this is a 2 week task, if it looks like it might be longer I'd bring in @rohank07 that worked on it previously to speed things up :)

@wg102
Copy link
Contributor

wg102 commented Sep 12, 2022

Closing since we are going directly to 1.6 See #1337 issue which replaces this one

@wg102 wg102 closed this as completed Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/engineering Requires attention from engineering: focus on foundational components or platform DevOps component/kubeflow Kubeflow Related kind/design kind/epic An epic kind/feature New feature or request triage/wont-fix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants